Gender and race related bias in AI systems

Since its invention in the 1970s, face recognition has made giant strides forward. Today it is considered to be the most natural of all biometric measurements. And for good reason – we recognise ourselves not by looking at our fingerprints or irises, for example, but by looking at our faces.

In the past few years, we have also seen manyfold rise in the adoption of face recognition technology – from tagging photos in Facebook and using it to unlock smartphones to gaining access into company buildings and mass surveillance. But concerns are mounting over citizens’ civil rights, privacy, and access to a fair and thorough investigative process as more law enforcement agencies adopt this technology as a tool and as the companies that develop this technology propose new ways for law enforcement to use it.

Last week, the controversy over the ethical and legal use of this AI technology was back on the front page when two MIT researchers published an update to their previous work on Gender Shade, an algorithmic audit of gender and skin type related bias in commercial facial recognition technology. This report even prompted the “MIT Technology Review – The Algorithm” newsletter to dedicate this week’s edition to unpack the issue of AI bias in face recognition. Following the trend, in this article, I summarise the key findings of two related but separate studies on this topic (including the paper on Gender Shade) referred to in the MIT newsletter.


Links to Relevant Books

Paper 1 – Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products, by Inioluwa Deborah Raji and Joy Buolamwini, MIT (link)

This is a study of gender and race related algorithmic bias present in five leading commercial face recognition systems, and the impact of algorithmic auditing in increasing fairness and transparency in commercial AI systems.

The key findings of this study, which is summarised in the table below, indicates that while Microsoft and IBM’s face recognition technology have significantly improved in their ability to identify an individual’s gender (since the last Gender Shades audit, which was made public around mid-2018), Amazon’s system, Rekognition, which was not part of the earlier study, had much more difficulty in telling the gender of female faces and of darker-skinned faces. Rekognition made no mistakes when identifying the gender of lighter-skinned men, but it mistook women for men 19 percent of the time and mistook darker-skinned women for men 31 percent of the time.

Source: MIT Technology Review

The key takeaway from this study is that within seven months of the original audit in 2018, the three companies in scope of that review have significantly improved their face recognition systems, with a reduced accuracy disparities between males and females and darker and lighter-skinned subgroups. For example, the recent review found that Microsoft’s technology mistook darker-skinned women for men just 1.5 percent of the time, a remarkable improvement over the 21 percent error rate in the prior audit.

The paper, therefore, concludes that by highlighting the issue of classification performance disparities and amplifying public awareness, the study was able to motivate companies to prioritise the issue and yield significant improvements within a relatively short period.

From this study, it is very clear that as more and more businesses and government agencies adopt AI, “algorithmic auditing” will have a significant role to play in ensuring fairness and transparency in commercial AI systems.

Paper 2 – Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure, MIT (link)

The authors of this paper propose a new algorithm to deal with the problem of modern machine learning based systems’ vulnerability to bias, especially due to hidden, and potentially unknown, under-represented class in training data – i.e. each class (e.g. darker male, darker female, light male, lighter female) does not make up an equal portion of the training dataset. Imbalanced datasets degrades the performance of machine learning techniques as the overall accuracy and decision making is biased to the majority class, which leads to misclassifying the minority class samples or furthermore treating them as noise.

This new algorithm, which works without any human supervision, when integrated with an existing machine learning model (such as a face recognition system), identifies any under-represented examples in the training dataset and subsequently increases the probability at which the learning algorithm samples these data points (in other words, the system spends extra time looking at them to compensate). Experiments carried out by the authors indicate a significant reduction in gender bias when this technique is applied to the problem of face recognition.

Class imbalance in training dataset is a huge challenge in machine learning. Although there are a number of techniques that data scientists currently use to deal with it (for example SMOTE), the solution proposed by the authors appears a lot more robust and intuitive. Perhaps this is just the beginning of a new generation of algorithms that deal with the problem class imbalance?


Links to Relevant Books

Leave a Reply

Your email address will not be published. Required fields are marked *