Artificial Intelligence is often depicted as the supreme salvation technology. It would relieve humanity from all wrongs by the virtues of algorithms. While it bathes the world in a new light, AI also casts its shadows. Under the guise of mathematical objectivity and neutrality, algorithmic predictions can be biased and reinforce existing prejudice. Despite indisputable advances, particularly in sectors like Healthcare, AI inherits our most undesirable traits. Racial discrimination, misogyny, exclusiveness, socioeconomic elitism, cultural hegemony, and many other human biases creep into algorithms.
No artificial counterpart will heal the world in our stead. An ethical approach to the design of algorithms, focused on diversity and the integration of human values, can, on the other hand, make AI the ally of an enlightened present and a just future. Ethics must be AI’s nervous system.
AI Is Powering Everyday Life
The transformative influence of AI technologies is felt in all sectors. AI predicts cancer and heart attacks. It increases the food supply globally and improves transportation and infrastructure. It tracks down criminals and advises judges. It speaks almost every language and translates those that have been lost. It has read virtually every book, detects “fake news”, and writes news articles. More simply, AI technologies influence which movies we watch, which planes we take, and which highways we drive. AI has become a partner in the experience of ordinary life as well as that of the extraordinary.
What does “Artificial Intelligence” mean? Originally, AI is a challenge — that of emulating human intelligence in machines. In 2020, it designates a set of technologies and methods that aim at winning the bet. Machine learning, Deep Learning, Artificial Neural Nets, Computer Vision, and Natural Language Processing are all subsets of AI. They allow machines to learn from large volumes of data and make real-world predictions to solve specific problems. The nature, origin, quantity, quality, and variety of data used to train algorithms play a decisive role in the performance and evolution of AI.
In highly specialized areas like healthcare, AI already performs better than humans. In January 2020, Google Health demonstrated the superior effectiveness of its algorithmic model in breast cancer diagnosis, surpassing the expertise of top radiologists. In the same vein, in June 2019, a collaboration of researchers from MIT and The Massachusetts General Hospital trained a model capable of predicting the risk of developing breast cancer over a five-year horizon. As researchers highlighted, the algorithm demonstrated equal precision for white and black patients. According to Allison Kurian, Associate Professor of Medicine and Research at Stanford University School of Medicine, this was not the case with previous tools. Trained on 90,000 mammograms from a wide variety of patients, the MIT-MGH model stood out for its equity in diagnostic precision. A step all the more necessary since African-American women have a 42% increased risk of dying from breast cancer, due to the heritage of structural and financial obstacles in access to prevention and treatment.
AI’s DNA Must Be Scrutinized
The representativeness of populations and their contexts within the data plays a decisive role in algorithmic performance and fairness. Without it, algorithms have the power to automate and increase inequalities. Because they tend to form homogeneous groups subject to cognitive bias, AI researchers and engineers are struggling to solve this thorny problem. Predominantly white, male, and graduates from elite universities, their perspectives can lack cognitive diversity. Although there are exceptions, most lack sufficient training in applied ethics and inclusive design methodologies, which are practical approaches to bias elimination.
Despite programs focused on diversity and inclusion, women represent only 18% of the influential fraction of AI researchers worldwide, and 80% of tenured university professors in the field are men.
Such figures are unsurprising given the lack of diversity in the Technology industry. According to reports on D&I released in 2019, only 3.3% of Google employees are black (including 1.4% women), while Facebook and Microsoft barely do better with 3.8% and 4.4% respectively. In the same companies, two-thirds of managers and directors are white men.
Cultural hegemony is also an issue. Although world-class talents can be found globally and Europe has an extraordinary pool of AI researchers, “the countries with the highest number of high-impact researchers (i.e., those within the 18%) are (predominantly) the United States, China, the United Kingdom, Australia, and Canada” (Source: JF Gagne).
Because AI emulates the appearance of neutrality and objectivity, its DNA must be scrutinized. That’s the challenge Joy Buolamwini, MIT doctoral student and founder of the Algorithmic Justice League, took up by uncovering bias in face classification algorithms.
In 2018, she tested commercial gender classification systems and uncovered substantial racial and gender disparities. Buolamwini reported that “darker-skinned females are the most misclassified group (with error rates of up to 34.7%)” while “the maximum error rate for lighter-skinned males is 0.8%”. According to Buolamwini, the performance gap results from the uneven composition of the demographic and phenotypic datasets used to train the algorithms. These findings bring two important facts to light: first of all, demographic groups are not classified with equal accuracy, second of all, accuracy appears to be surprisingly developer-dependent.
In December 2019, a federal study led by the National Institute of Standards and Technology (NIST), “found empirical evidence for the existence of a wide range of accuracy across demographic differences in the majority of the current face recognition algorithms that were evaluated” (NIST). A total of 18.27 million images of 8.49 million people from four US datasets were processed through 189 commercial algorithms from 99 developers. The datasets included domestic mugshots, immigration and visa applicants photographs, and border crossing photographs. The study concluded that most algorithms on the market misidentify members of some groups up to 100 times more than others. “African American women were inaccurately identified most frequently in one-to-many searches, while Asians, African Americans, Native Americans, and Pacific Islanders were all misidentified in one-to-one searches. Children and the elderly were also falsely identified more. In some cases, Asian and African American people were misidentified as much as 100 times more than white men. The highest accuracy rates were generally found among middle-aged white men” (The Verge). NIST reports a variety of causes for those disparities, ranging from the lack of demographic diversity to variation in the quality of photographs in datasets.
The consequences of false negatives and false positives in Facial Recognition are serious. As a result of a false positive, an innocent person could be matched with a criminal profile, or imposters granted access to secure systems and physical areas. On the opposite spectrum, a false negative could deny access to a legitimate visitor.
The Devil Is In The Details
Obstacles to algorithm fairness can take many forms. The origins of the data and the probity of its annotation also impress upon the learnings of an algorithm.
Created in 2009 by researchers from Princeton and Stanford universities, ImagetNet is the global database for training visual recognition algorithms. The base’s 14 million images were hand-annotated via the crowdsourcing platform Mechanical Turk, by an underpaid human workforce, at a pace of 50 images per minute per worker. ImageNet was red-flagged in September 2019 by AI expert Kate Crawford and artist Trevor Paglen. Using the database, Crawford and Paglen trained the ImageNet Roulette model to give Internet users the opportunity to submit their selfies for classification. Some selfies ended up being associated with fanciful labels such as “librarian”, “magician”, “divorced”, “former smoker”, and sometimes discriminatory and offensive ones such as “loser”, “wrongdoer”, “sexy”, “slut”, “negro”, or “slant-eye”.
The ImageNet Roulette project brought the dark side of AI in the light: the use of databases annotated with stereotypical, racist and misogynistic categories that contaminate algorithms down the line. ImageNet has since removed 600,000 images from its database and admitted that 438 labels were “dangerous” and 1,155 “sensitive and offensive.”
Thanks to better governance and value-sensitive design, algorithms will eventually become less politically permeated. Some biases are, however, more pernicious than others.
In the justice sector, where algorithms assess criminal risk and recidivism, a Pro Publica report on Predictive Justice revealed inconsistencies, both detrimental to moderate criminals and advantageous to high-profile felons. The shortcomings of algorithmic risk assessment tools in the criminal justice system have been documented, however, several jurisdictions still mandate their use. A report published by the Partnership on AI highlights the need to ensure “that the tools are designed and built to mitigate bias at both the model and data layers, and that proper protocols are in place to promote transparency and accountability.” The PAI’s concerns relate to the perception of such tools as “objective or neutral.” Despite consistency and replicability, models can still exhibit bias carried through by human decision-making. “Decisions regarding what data to use, how to handle missing data, what objectives to optimize, and what thresholds to set all have significant implications on the accuracy, validity, and bias of these tools” (source: PAI). In an area like justice where moral judgment is paramount, data-induced bias is counterintuitive and merits attention. While AI has the potential to accurately assess risk and increase safety and security, data is not a neutral material.
Human-decision making is still the cornerstone of AI’s performance. The pressure to rapidly present a Minimum Viable Product to investors can put ethical reasoning and accountability at risk. Ethical assessments should not be sacrificed on the altar of fast turnarounds.
HR is also adopting AI-powered technologies. From automatic resume sorting to psychometric evaluation of candidates and employees, one US company in two has deployed at least one AI-powered HR system. We can expect to see this trend in two companies in three in 2020. Facial Recognition and Facial Analysis will be used for video interviewing. Machine Learning will increasingly lead the charge on applicant sourcing and screening. Natural Language Processing will transform Employee Advocacy programs and retention strategies. While AI pervades every HR function, caution should be exercised. Hiring algorithms trained on non-inclusive data have led to unintended discrimination. Measurement of employee engagement via automated emotion analysis can also be biased and inaccurate.
While 69% of HR departments report getting value from the deployment of AI systems, companies must ensure they use fair and trustworthy systems. They should deploy preventive strategies to ensure legal compliance and ethical management of risks associated with AI technologies.
HR practitioners should be trained in the legal and ethical deployment of AI technology. Teams developing or integrating a third-party system should be able to audit the source, the quality and the diversity of training datasets, and regularly test the model for fairness with the assistance of independent researchers.
The involvement of professionals from multidisciplinary backgrounds, trained in applied ethics, representative of divergent interests from diverse populations, is the best response to algorithmic bias.
Towards An AI-Powered “Fabric of Felicity”
In 2020, the ethical principles of transparency, fairness, inclusiveness, reliability, and equity are still fallow, while they should be–in the name of intelligence–the spearhead of the evolution of AI.
Regulators and legislators must work towards the requirements of transparency, equity, and independent evaluation of AI. Organizations must take proactive steps that support the equitable, inclusive, and value-sensitive design of AI. Experts in charge of research, education, strategy, development, implementation, and evaluation of AI-powered systems must reflect human diversity and, ultimately, impress at the heart of those systems the world’s most desirable values, for all.
Aristotle’s doctrine of the mean, Jeremy Bentham’s Utility Calculus, John Stuart Mill’s Utilitarianism, and Auguste Comte’s Altruism may be different approaches to Ethics, they all aimed at a common target: universal happiness. Guided by ‘the hands of reason and of law’, AI could rear the “fabric of felicity” Jeremy Bentham meditated upon two centuries ago.
Today, Ethics must be the nervous system of Artificial Intelligence.
This article was authored by Dr. Valerie Morignat, CEO of Intelligent Story. MIT certified AI Strategist and Machine Learning Expert. Associate Professor.
Headquartered in San Francisco, Intelligent Story is a business consulting firm accelerating the world’s transition to AI.
A shorter version in French of this article was published on January 7, 2020 in the magazine OutreMers360 with the title “IA : la diversité au secours des algorithmes”.