Ethics and AI: Approaches for Addressing Underlying Bias in Data Sets

ethics in AI

Artificial Intelligence (AI) is often portrayed as a magical black box where data goes in and something like a self-driving car comes out. Unfortunately, the magical black box is subject to any constraints on data going in. For example, poor quality and quantity data will lead to a self-driving car you want to stay far away from. And left unaddressed, biased data will lead to a biased outcome. Bias in AI can refer to either statistical bias, where the sampled values differ from the true values of a data set, or social bias, unfair prejudice against a person or group of people. While math is not biased, humans and human behavior is. As we replace human judgement or decision making with more advanced AI solutions, it becomes clear that algorithms are not free from these same kinds of biases, often resulting in unintended, yet nonetheless harmful consequences.

With regular frequency, stories in the news appear about AI that result in sexism or racism, in some cases having profound and disastrous effects on people’s lives. The intended outcome in these cases is not to discriminate, and in fact, often race or gender is explicitly removed as a variable. However, removing these features is not sufficient to fix underlying inequities within society’s data. Algorithms are designed to detect complex patterns in large data sets. Therefore, if a pattern of activity reflects underlying discrimination, the algorithm will pick up on the pattern – biased data in leads to a biased decision. However, this is not to say that AI is doomed to discriminate. Steps can be taken to mitigate bias.

Pandata has been working with a major university to develop an AI solution around student engagement. We are developing a way to recommend to students learning and growth opportunities such as clubs, internships, study abroad. The larger goal is to facilitate engagement and enrich the student experience, campus, and greater community. A key aspect of this goal is to target the unengaged – ranging from the kid who would rather play videogames all day to the first generation college student who is unfamiliar with the system and doesn’t know the benefits or expectations around extracurricular participation. A special focus is also placed on diversity and inclusion.

Pandata is working to develop a Recommender System (a la Netflix or Amazon) to suggest personalized opportunities to individuals based on patterns in their activities and those of similar students. Right away, there is a potential pitfall – if the recommendations are based on prior experience or experience of similar students, what do you do when the student is not already engaged? This is referred to as the “cold start problem” – the idea that in machine learning, there can be insufficient data upon initial training to begin identifying patterns. Statistically, underrepresented minorities are less likely to be engaged, therefore, is the Recommender System already perpetuating biases? How can we design and implement a solution that embraces both equality – where all students have an equal chance of being exposed to an opportunity regardless of membership in a demographic, and equity – where recommendations specifically target groups of interest with the goal of actively increasing diversity without penalizing the dominant group.

To build a more equal and equitable AI solution, Pandata has developed the “Equity Kernel”, a set of rules, algorithms and metrics aimed at addressing equity and equality within our recommender system output. It consists of four levels:

  • Metrics
  • Reducing bias in the underlying data
  • Reducing bias in a model output
  • Human feedback

The key to determining if an algorithm is perpetuating bias is first being able to quantify it. Measuring bias must be done in both the data used to build the initial algorithm, and in the model output. Metrics such as group representation across activity type can be used. And while customization is necessary, there are powerful open source tools such as IBM’s AI Fairness 360 Toolkit that provide accessible ways to vet and mitigate any bias in AI solutions. Human feedback is also key for success. Capturing the thoughts of stakeholders on their goals around inclusion as well as feedback from students on their experiences is key to capturing nuances in the data and design that an algorithm cannot.

AI has so much potential to enrich lives, and in this case, the student experience. However, with increasing complexity of AI applications come increasing risk. Like so much else in data science and artificial intelligence, designing an equitable AI solution is an iterative process. Awareness that bias is a persistent and pervasive problem and taking steps to measure, mitigate, and continue to refine can go a long way towards equitable AI.

Hannah Arnson is Lead Data Scientist at Pandata.

Leave a Reply

Your email address will not be published. Required fields are marked *