Bias in AI in 2020
Anna-Maria Meck / Voice User Interface Architect at VUI.agency / 26.08.2020
AI is coining many of today’s and even more of tomorrow’s technologies. As powerful as AI is, we need to keep in mind that it is only as good as its teachers and the data we feed it.
With AI controlling various aspects of our lives, we need to make sure, it is inclusive, plural and treats everyone equally. But we are not there yet: from facial recognition working worse for People of Color to living in filter bubbles due to algorithms: we – the Tech Community as well as our Users – see Bias in AI everywhere.
To get to grips with this issue, it is essential to know where Bias in AI stems from. With this blog post, I want to shed light on the three major components that are contributing towards biased AI systems. So, let’s break down the problem in three digestible chunks and go from there:
People – who feed an AI
Data – what we feed an AI
Algorithms – how an AI digests its food
It is unavoidable that we all are biased. We each grew up in unique conditions. We speak certain languages, have a particular cultural background. We have met people and are shaped by diverse experiences or the absence of them. We take our socialization with us wherever we go. And we do not leave it in front of the workplace.
This is not a problem per se but here comes the but: let’s say you have a dozen people working on a Voice Assistant or any AI driven technology and they are a homogenous crowd. They therefore most probably share extensive parts of their cultural understanding, a common language and believe system. Their learned Biases amplify and leave a mark in their work as they are the ones selecting training data, training algorithms, and performing pre and postprocessing of data. We know that women in tech make only for 25% in Europe while accounting for 51% of the overall population and an astounding only 1% of Hispanic women in the US holding computing occupations when making up 17% of the US’ population in total.
The field of people developing AI is therefore in large parts a non-diverse one, culminating in setting up and reinforcing Biases in systems using Artificial Intelligence.
Of course, Bias in the workplace is not reserved for AI and tech – there are many professions being comprised of e.g. predominantly men or women: nurses, construction workers, kindergarten teachers, bus drivers. The difference is that AI is shaping many of tomorrow’s technologies and is being used for decision making processes not involving humans anymore. If an AI decides over me being able to get a loan (or an algorithm if the automatic soap dispenser in a hotel will actually provide me with soap), we need to make sure that everybody is being treated fair and equally.
People make up a huge portion of making data a key feature of Bias in AI as their selection process is already biased. Nonetheless, there is other dimensions to the data part of the problem. First things first though: data is of course a key feature of AI and machine learning – it basically is the meals we feed it. Machine learning works with patterns and will roam data to find these. The more frequent a pattern comes up, the more it will manifest in an AI. Any given input to an AI will be tried to be applied to one of these patterns. The more unusual your input is, the worse your AI performs.
Applying this to the Voice Space: an ASR is being trained on a certain data set, comprised of predominantly High German speakers. A speaker from Austria (while being a native speaker of German) using a respective Voice Assistant will be met with poor recognition.
Now: why not choose a more divers training data set then? In fact, you might be faced with the reality that there is just none.
We see a lot of existing and widely used data sets which are discriminatory in that they foremost contain one-sided data of e.g. male speakers. To give you an example: TED talks have repeatedly been analyzed by speech scientists for broader purposes but incorporate 70% male speakers. Studies furthermore find that “almost all big data sets” which are generated by systems using Machine Learning are biased. So even with the best of intentions, you may not be able to work bias-free with an already existing database.
After an AI has been fed, it starts digesting its data with the help of algorithms to make sense of what it was supplied with and to make decisions based on what is has learned. These algorithms also pose a threat to a non-biased approach to AI.
AlgorithmWatch, a German non-profit evaluating algorithmic decision making processes, found that Instagram’s algorithm favors posts which display a greater amount of nudity over such which do contain more fabric. Amazon built an AI-driven hiring tool that should facilitate hiring top people for a given job. But the tool had been trained on applications from previous successful recruitments which mostly came from men, thereby discriminating against women.
The algorithm PULSE has created a great deal of heat when turning a pixelated picture of Barack Obama into that of a white man.
And even if reality is depicted correctly by an algorithm in that e.g. there’s more female than male nurses, the picture of a man in scrubs should not automatically be subtitled “Doctor” while that of a woman is subtitled “Nurse”. This simply does not do justice to the idea of a plural, diverse and inclusive society we want to work towards to.
People. Data. Algorithms. Bias in AI is actually different Biases which together form a pressing problem with this technology. One which is being amplified over and over when being used and is hard to overcome due to a lack of transparency, traceability, and awareness for the problem. With the perspective of AI taking over more and more important decision-making processes, the tech community needs to develop a consciousness for Bias in AI and work towards solving it. AI needs to be an enhancement to current decision-making processes and not reproduce and reinforce errors and inaccuracies already lingering in our society.