An Introduction to Fair Machine Learning (and its Limits)

This post is part 2 of the KITE blog series on algorithmic fairness.

Blog post by Otto Sahlgren

The writer is a Doctoral Researcher in Philosophy at Tampere University. His research focuses on ethics of AI, philosophy of discrimination and algorithmic fairness.

This blog post is the second part in a series considering algorithmic fairness from both a technical and a philosophical perspective. It is meant to serve industry practitioners as well as those with no technical background but with an interest in ethics. This post discusses the emerging field of research on ‘fair machine learning’, highlighting some key research and technical tools for practitioners while also considering limits of techno-solutionism in the case of fairness.

Key take-aways:

  • The field of ‘fair machine learning’ has introduced a variety of tools for identifying and preventing discrimination and unfair treatment in the use of machine learning technologies.
  • These tools have practical value in that they assist in risk management and model development. However, developers should not fall for the myth in the methodology: fairness is a contested concept and cannot be “solved” mathematically or with mere technical interventions.



Let’s start with a simple assumption: AI practitioners want their models to generalize from limited information to novel instances while avoiding adverse outcomes. The first post in this series discussed the issue of algorithmic bias, which may lead to one class of such adverse outcomes: unfair or discriminatory treatment. Depending on the use-case, those unfairly treated may comprise (groups of) human beings but also entities such as companies, service providers, product categories, geographical locations, and the sort.

The notion of fairness – or justice, more generally – lies at the hearts of modern constitutions and human rights frameworks, and indeed in most ethical guidelines for AI design and development (see Jobin et al. 2019). But what does fairness mean in the context of machine learning (ML), specifically? This question is debated, as we will see throughout this series.

To provide a rough characterization, fairness in ML is often understood as the absence of an (illegitimate) effect of an individual’s legally protected or otherwise sensitive attribute (e.g., gender, sexual orientation, skin color etc.) on how the algorithm processes information or on the outcomes it produces. Most often this means that the outcomes generated by an algorithm are independent of such attributes (or, more specifically, information about them), and that outcomes are balanced across groups in some way. If the outcomes depend on suspect attributes, the algorithm can be understood as embedding some type of wrongful or unlawful bias.

Fair Machine Learning

The ‘fair ML’ research community has tackled the problem of bias by presenting formal, mathematical and statistical metrics for measuring discrimination in algorithms, which then inform efforts to mitigate unwanted bias. Indeed, a plethora of fairness metrics have been introduced in the literature – at least 21 and counting. These formalized definitions for fairness provide model developers with a benchmark for how ‘fair’ an algorithm is. Future posts in this series look at these definitions more closely.

An estimate of the overall accuracy of a model does not tell us whether the model works equally well across demographic, phenotypic and environmental factors. In fairness auditing, ML models are tested across such factors. For example, facial recognition software might be tested on individuals of different genders or in different humidity and lighting conditions to measure disparities in performance (e.g., accuracy) across these factors. There are already toolkits (e.g. IBM 360 Fairness, Aequitas, Microsoft Fair Learn, What-If Tool) and other resources available to guide and help with fairness auditing.

Bias mitigation methods

Results of fairness auditing inform efforts to mitigate discovered biases at different stages of development. There are various technical tools for this task, called bias mitigation methods. They can be distinguished into ones that aim to improve model fairness before and during the training phase (pre-processing and in-processing), or after a model has learned by intervening on the model outputs (post-processing). Let’s take a brief look at few of these methods.


Pre-processing methods comprise interventions on the training data (see Kamiran & Calders 2012). In suppression, suspect data categories or items (e.g., sensitive data) are removed from the training data to prevent the algorithm from using such information during modeling. This can also include categories or items which correlate with suspect attributes as to prevent discrimination-by-proxy. The training data can also be re-sampled by selecting a more representative subset of the initial training data, for example.  Input–output tuples can synthetically generated or assigned weights (re-weighting) so that, during training, the algorithm can learn the predictive features for underrepresented examples, improving accuracy for low-confidence regions in the data. These methods may prove useful in cases of statistical bias.

Sometimes data labels exhibit bias. For example, historical hiring data may show that equally merited women and men are hired at disproportionate rates. Here, re-labeling may do the trick. This method involves checking the data labels in the training data for such discriminatory bias and changing them to ensure that similar individuals receive similar outcomes.


In-processing methods focus on reducing bias during the learning process. One method, adversarial debiasing (Zhang et al. 2018), relies on adversarial learning. Here, a classifier model is trained in a way that seeks to simultaneously maximize predictive accuracy and prevent an adversary (a part of the model) from inferring information about sensitive attributes from the outputs. Other ways to reduce bias during learning is to enforce a fairness constraint on the optimization task (e.g., Zafar et al. 2019) or to incorporate a regularization term to the learning objective in order to enhance fairness (Kamishima et al. 2012).


Post-processing methods are applied to finished models, or more specifically, the predictions they generate. These methods are particularly useful when one does not have access to the training data or cannot intervene on a ‘black box’ model. Reject-option classification (Kamiran, Karim & Zhang 2012) is a method where the output labels around a given decision threshold are changed to reduce disparities in outcomes. For example, say an algorithm generates a positive decision (e.g., ‘hire’) when a given decision threshold (say, 0.8 probability) is exceeded. Reject-option classification looks at the output labels (‘hired’ or ‘not hired’) for scores within a determined margin near this threshold (say, 0.75–0.85), and changes the output labels closest to the threshold in a way that increases the unprivileged group’s representation in the positive outcome class. There are other methods for fairness enhancing post-processing (e.g., Hardt et al. 2016).

What metrics can’t capture

Fairness metrics and bias mitigation methods are surely of value for model developers. They enable stricter scrutiny of ML models and help organizations manage risks when using predictive analytics. But what does it mean to “measure” discrimination or fairness? I’ll tease one rough answer to this question: while ‘fairness’ can be measured in different ways as a property of models, it is important to understand that both identifying and preventing discrimination and unfair treatment require a focus on social factors outside the model.

First, one should note that ML models discriminate in a generic sense by default, as it were: they search for similarities and differences between individuals and data items and separate them into classes or clusters (those who get hired and those who do not, for instance). The question is, how does one identify wrongful instances of such conduct then? We might turn to the legal concept of discrimination.

Roughly put, prohibitions of discrimination bundle together certain kinds of conduct that undermine institutions and norms that our societies deem just and legitimate (see Shin 2013). For example, direct and indirect discrimination denote different kinds of actions and conduct, but both constitute discrimination (if and when they do). The former is understood as disadvantageous treatment of an individual based on a protected attribute (e.g., gender). The latter is understood as a practice or policy which disproportionately burdens some protected group, even if the policy is not explicitly based on suspect attributes. Furthermore, when seemingly neutral actions or policies reproduce existing inequalities in society or reinforce harmful stereotypes, we might talk of structural discrimination.

Certain cases – such as use of gender as a factor in loan decisions (see previous post) – may be more straightforwardly assessed in this respect. However, other cases are less clear, and not all morally wrongful discrimination may fall under the scope of law (e.g., forms of structural discrimination). As has become clear, ML models can reproduce structural inequalities, even if accurate in the statistical sense.

This raises questions concerning social justice and responsibility: How do we prevent reproduction of systemic inequality in the use of AI? Whose responsibility is it to break vicious cycles of discrimination? How can this be decided fairly?

Legal prohibitions of discrimination also concern a broader range of actions and practices than mere relationships between inputs and outputs, which is what model developers may be inclined to scrutinize most closely. Actions prohibited by those norms include, for example, use of demeaning or derogatory language, hate speech, harassment etc. Explicit orders to discriminate against members of some group in the workplace are considered discriminatory even if those orders are not followed.

This raises complicated questions when we think about racist bots which (learn from humans to) spew hate speech, and humans relying on possibly unfair, non-transparent algorithmic systems in decision-making.

Judicial assessments of discrimination charges also place significant emphasis on context and proportionality. The use-contexts of ML products and their target populations change over time, further complicating matters. It is doubtful whether mere formal and technical tools that abstract such considerations away will – or even can – prove up to the task of identifying and preventing discrimination. (See discussions in Selbst et al. 2019; Wachter et al. 2020.)

The Myth in the Methodology

As noted, (the people using) ML models discriminate by default in a generic sense of the term. The question at the core of algorithmic fairness, then, is what makes certain disadvantages (re)produced via algorithmic methods justifiable? Measurements of disparities in algorithm-generated outcomes are informative, and sometimes even actionable, but they do not lend conclusive judgment as to whether the discovered disparities are justified, legally or morally. The belief that they would do so is what Green and Hu (2018) call the myth in the methodology.

Mythical thinking in algorithmic fairness neglects many ethical questions: Who determines what is an unacceptable amount of bias in a ML model and how? Should we tolerate 5, 10 or 20 per cent differences in groups’ error rates? How should we balance fairness and accuracy in ML models? Deciding on these issues is a significant exercise of both moral reasoning and power.

If we focus only on algorithms and data when considering fairness, we abstract away many social and contextual factors that should be accounted for in order to prevent wrongful treatment. Moreover, a narrow technical view of fairness may lead to misguided ways of addressing discrimination; ways that are themselves possibly unlawful (see Guidance on AI and Data Protection 2020) or do not address the underlying problem. For example, to prevent its system from labeling persons of color as gorillas, Google decided to remove the label ‘gorilla’ from the system making it unable to tag gorillas in pictures altogether.

Model developers should not fall for the myth in the methodology: fairness is no math problem. It is a contested concept and different stakeholders will have differing views about what constitutes fairness in a particular case. This does not mean that algorithmic fairness is impossible, however. It only means that answers to these questions lie outside the technology itself.

In Part III, we dive deep into this issue and look at group fairness definitions for ML models, alongside their limitations.


  • Green, B., & Hu, L. (2018). The myth in the methodology: Towards a recontextualization of fairness in machine learning. Proceedings of the machine learning: the debates workshop.
  • Guidance on AI and Data Protection. (2020). Information Commissioner’s Office. URL =
  • Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. Advances in neural information processing systems, pp. 3315-3323.
  • Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), pp. 389-399.
  • Kamishima, T., Akaho, S., Asoh, H., & Sakuma, J. (2012). Fairness-aware classifier with prejudice remover regularizer. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 35-50.
  • Kamiran, F., & Calders, T. (2012). Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1), pp. 1-33.
  • Kamiran, F., Karim, A., & Zhang, X. (2012). Decision theory for discrimination-aware classification. 2012 IEEE 12th International Conference on Data Mining, pp. 924-929.
  • Narayanan, A. (2018). Translation tutorial: 21 fairness definitions and their politics. Proceedings of Confeference on Fairness, Accountability and Transparency.
  • Shin, P. S. (2013). Is There a Unitary Concept of Discrimination?. Philosophical Foundations of Discrimination Law, Sophia Moreau and Deborah Hellman (eds.). Oxford University Press, pp. 13-27.
  • Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 59-68.
  • Wachter, S., Mittelstadt, B., & Russell, C. (2020). Why fairness cannot be automated: Bridging the gap between EU non-discrimination law and AI. URL =
  • Zafar, M. B., Valera, I., Gomez-Rodriguez, M., & Gummadi, K. P. (2019). Fairness Constraints: A Flexible Approach for Fair Classification. Journal of Machine Learning Research, 20(75), pp. 1-42.
  • Zhang, B. H., Lemoine, B., & Mitchell, M. (2018). Mitigating unwanted biases with adversarial learning. Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 335-340.