Pre- and Post- Fairness Processing for Black-Box Classifiers



Journal Title

Journal ISSN

Volume Title



Machine learning algorithms increasingly support decision-making systems in contexts where outcomes have long-term implications on the subject's well-being. At issue is whether an algorithm's outcomes are unfair and depend on demographic characteristics -- race, age, gender, religious or political beliefs -- that are irrelevant to the task. Empirical evidence indicate that a wide range of applications do not deliver the same experience depending on demographic characteristics of the client. This study focuses on two types of approaches to mitigate potentially unfair outcomes of a classifier while making no assumption on the classifier itself: (i) pre-processing to remove encoded biases in the data; and, (ii) post-processing to audit whether a classifier's outcomes meet a given fairness criteria. In fair pre-processing, we focus on methods in unsupervised fair representation learning that extract from a data its underlying latent factors, while removing dependencies between latent variables and sensitive attributes. We make four contributions to the fair representation research. First, we recast fair representation learning as a rate-distortion problem and show that an encoder that filters out information redundancies would also remove dependencies between sensitive attributes and representations. This insight motivates FBC, Fairness By Compression, a compression-based approach to unsupervised fair representation learning that achieves state-of-the-art performance in terms of fairness-information trade-off. Second, we implement a single shot fair representation learning method, SoFaiR, that allows the user to explore the entire unfairness-distortion curve at test time with one single trained model. SoFaiR adjusts the fairness/information properties of a representation at test time by masking bits in the tail of the bitstream. This reduces computational costs compared to existing methods in fair representation learning that require the user to re-train a model to explore different points on the fairness-information plane. Third, we posit that for image data, sensitive attributes like gender or race are likely to be abstract concepts. At the same time, a high quality reconstruction of images requires to encode high resolution details. Therefore, a rate-distortion approach to fair representation learning needs to model low and high resolution latent variables. To test this hypothesis, we encode images into a hierarchy of quantized latent variables. Empirically, we find that only deep hierarchies, independently of model capacity, can generate representations orthogonal to the sensitive attributes, while maintaining low and high resolution information about the images. Fourth, we derive necessary and sufficient conditions for a representation learned from a finite sample to offer fairness guarantees that generalize to any downstream user and to the infinite sample regime. The condition requires that for any distribution over the feature space, the encoder induces a distribution over the representation space such that the $\chi^{2}$ mutual information between features and representation is finite. Lastly, for both fairness pre-processing and auditing, it is reasonable to assume that classifiers that use the data are black-boxes that neither auditors nor data controllers can access to. In this context, we develop an auditing approach, mdfa (Multi-Differential Fairness Auditor), that verifies whether a classifier is nearly mean-independent of sensitive attributes within any subset of the feature space that can be computationally identifiable from a finite sample.



Fairness, Machine learning, Rate-Distortion, Representation Learning