Data bias: the insidious threat lurking in your data unless you’ve taken active steps to mitigate it. It causes the potential to harm, skew or invalidate your data completely – or sometimes all three at once. 

Our VP for Data Ethics & Governance, Sophie Chase-Borthwick, recently took part in an expert panel – PICCASO Special Interest Group and joined William Malcolm (Privacy Legal Director at Google), Radha Gohil (Data Ethics Strategy Lead at Shell), and Anne Woodley (Security Specialist at Microsoft) in drilling down into the details of data bias. Here we unpick some of their in-depth discussion.

Understand data bias

Bias in data can take many forms, with bias coming from humans, the data, or even the model itself. Any data governance service, data strategy service or any service which uses data should be aware of bias. Some types are:

  • Sample bias (otherwise known as selection bias): using a sample which isn’t reflective of the population, for example training a facial recognition system only on white men. 
  • Confirmation bias: the tendency to look for or interpret information that’s consistent with your own beliefs, for example scientists could sometimes selectively analyze and interpret data in a way that confirms their preferred hypothesis.
  • Historical bias: when the cultures and societal norms have become mired into systematic processes, for example training a model on historical data which contains gender biases would result in data bias being inherent in the output. 

The impact of bias can be negligible or significant, for example Apple’s racial discrimination in face recognition technology or Amazon’s secret AI recruiting tool that showed bias against women. Bias often impacts the most vulnerable and marginalized. 

Be aware of the ongoing tradeoffs for data

There are tradeoffs: fairness, accountability, safety,” according to William Malcolm, Privacy Legal Director at Google. “These are key factors in adopting AI solutions but we don’t acknowledge that sometimes they conflict.” 

It’s a key point to consider. Sometimes explainability can conflict with accuracy; you choose simpler algorithms to parse but it impacts the overall output. You could use human intervention to increase the accuracy with manual checks, but then it risks the human bias creeping in. 

While Calligo’s Sophie Chase-Borthwick observes: “Companies want to use algorithms to determine products and use AI to remove biased human beings. But which one is more or less biased?” 

There will always be threats for data bias. You should continually be aware of the ongoing tradeoffs and the implications each one has. 

Mitigating data bias

Now we’ve understood what data bias is, it’s time to consider how you can mitigate these biases. These could take many forms, such as: 

  • Checking there’s no infrastructure issues in databases.
  • Being mindful when it comes to data processing to identify any possible sources of bias.
  • Considering which model is the least biased as well as which model would perform well. 
  • Instilling a robust anti-bias culture in your organization, for example training everyone to identify data bias.
  • Monitoring real-world performance for your machine learning lifecycle. It’s crucial to never see a model as ‘finished’. There should be continuous monitoring and observing for how well the model is performing. 

There’s no one solution for tackling bias. It’s an ongoing challenge. Throughout the cycle, the biases might keep changing, and so the solution for them must keep changing. 

Humans and machines must work together

In Radha Gohil’s, Data Ethics Strategy Lead at Shell, words,: “We need humans in the loop for verification when we train and govern a model. Humans have an innate ability to identify cultural nuance in a way that an algorithm cannot.” Microsoft’s Security Specialist, Anne Woodley, agrees: “When working with data, the onus is on humans to set up the right checks and balances throughout the cycle so that when bias creeps in, it can be identified quickly.

This draws on Article 22 for EU GDPR for people having the right to ‘human intervention’ if they want to contest a decision made by an algorithm. This contestation has a legal effect enshrined in EU law. For a data privacy service, this is especially important.

There’s another conundrum to consider. “Ironically, sometimes you can remove some data and it impacts the end of the data. There needs to be a careful balance with the data going in … and the data not going in,” according to Sophie Chase-Borthwick. 

Machines are only as good as the data which is put in, so we should aim to put in the cleanest, most unbiased data possible to get the most actionable and impactful results. 

What’s next for data bias? 

Looking to the future, minimizing data bias will evolve as/when new AI and Machine Learning technologies appear. However, new technologies might create new biases themselves. 

Data bias is just one facet of the wider picture of data ethics. It’s crucial to maintain rigor and avoid complacency when it comes to any aspect of data ethics. 

In our next blog, we’ll be exploring ‘ethics-by-design’. So do stay tuned – or, in the meantime, you can get in touch with our team of experts who can help you with minimizing data bias and ensure ethical data use and insights.  

 

PICCASO Podcast with Data Privacy Panellists from Calligo, Google and Shell

” AI and the Ethical Implications of Bias in Machine Learning (ML) Models”

Available to watch On-Demand

Data Ethics is a major area for consideration in the world of data, governance, privacy and law. Artificial Intelligence (AI) can perform highly complex problem-solving (such as unravelling intricate cancer diagnoses), but it can also suffer major setbacks (such as the potential for racial discrimination).

AI is outperforming humans at narrowly defined, repetitive tasks, which is the space in which AI excels, there are however some risks associated with AI and during our panel debate, we have invited some leading experts and thought leaders to help us navigate this complex area. 

WATCH ON-DEMAND