A New AI Lexicon: Artificial Identity Cataracts

Illustration by Somnath Bhatt

Who defines the current AI discourse?

A guest post by Isabel García Velázquez. Isabel is a PhD student in the Department of Thematic Studies, Unit of Gender Studies at Linköping University. Isabel’s research examines caring human-robot interactions through mechanical and human touch in elderly care. Twitter: @IsabelG64687224

This essay is part of our ongoing “AI Lexicon” project, a call for contributions to generate alternate narratives, positionalities, and understandings to the better known and widely circulated ways of talking about AI.

“(…) images see with the eyes of those who see them, only that now blindness is the lot of everyone. You can still see, I’ll see less and less all the time, even though I may not lose my eyesight I shall become more and more blind because I shall have no one to see me” (Saramago 1995, p. 218).

In the language of medicine, cataracts develop when the natural lens of the eye exhibits whitish patches due to a breakdown of proteins. They cause cloudy vision and produce a sensation of seeing ghost objects. Furthermore, if the patient is undiagnosed, these patches will continue to grow and the vision deteriorates until blindness. This is what Artificial Identity Cataracts (AIC) do. They follow the same basic principle as biological cataracts: progressively blurring one’s surroundings through colonizing our minds and ways of knowing.

In this piece, cataracts do not refer to the physical eye condition, but it used as a metaphor for addressing how weaponized care structures of power and domination in AI technologies obscure human identities. I argue that “Cataract” would be an important lens for articulating the effects of coloniality of knowing and expand — or maybe alter — what is understood as bias and discriminatory practices in AI. To illustrate this, I explore some of the structural blind spots that linguistic revitalization tools for indigenous languages encounter. Echoing some of the arguments seen at conferences like ACM FAccT, I use AICs to raise questions such as: how can we think and listen with care when reinventing the narratives and interpretations of new technologies? How can we address the different constructions of knowledge about what AIC (or bias) means to different group of stakeholders? And how does the (un)consciously normalized asymmetry of AI’s lexicon shape the notion of a just technological future? These must be asked, and answered, if we really want to start thinking critically about AI claims.

- — -

Minority indigenous languages are often endangered (Fishman 2002). A case in point are Saami languages spoken in parts of Norway, Sweden, Finland, and Russia.¹ With the advent of new technologies, linguistic tools driven by AI such as Divvun² and Giellatekno³ have emerged in an attempt to revitalize languages in danger. Along the way, keyboards, e-dictionaries, Intelligent Computed-Assisted Language Learning programs and corpora are created to provide access to Saami (Sjur et al 2019). However, cataracts are one of the main obstacles encountered in the process. Companies and/or application developers had (and possibly still have) ideological cataracts that (un)intentionally prevent(ed) them from seeing “uninteresting” languages like Saami. Consequently, operating systems such as Windows, iOS, Android, macOS, and ChromeOS mirror the same degree of ideological cataracts of their creators. For example, Saami spelling checkers do not work in Chromebooks while in Google Docs they lack the red squiggles and right-click functionality. Moreover, Saami diacritic combinations are not properly displayed in text engines (Nørstebø, Trosterud 2019) because Unicode provides a standardized list of characters⁴ of the world’s big written languages, but minority language communities were, and are still, hardly visualized. This produces huge AICs present not only in programs, services, and functionalities but also in their users.

As users of majority languages, we take for granted that AI, linguistic tools and speech assistants work in our language. As a minority language user, it is not that simple. Lexical terms, accents and dialects shape languages and co-create identities. They are key for understanding who we are as individuals and as members of communities (Mahmoodi 20018). However, if Saami speakers want to use their language in any operating system, or even access linguistic tools such as text analysis, word generation, or text-to-speech software, they need to know the available Saami programs and then install them by themselves. These are not easy tasks, especially when cataracts block actual use. These cataracts produce ignorance and non-participation in the creation, development, implementation, and nourishing of linguistic tools. Hence, North Sámi words⁵ like sárdnut used in Gáivuotna, hoallat in Skánik, hállat in Kárášjohka and hupmat in Guovdageaidnu that mean “to speak” could be deafening silences of faint ghosts.

A common pitfall when designing linguistic tools to “preserve” endangered Indigenous languages is to reproduce the colonial care discourse rooted in Kipling´s perception on The White Man’s Burden. From this perspective, it is the white man’s duty to keep endangered Indigenous languages “alive,” here enacted by using AI to collect information, transcribe languages from remaining speakers, and store the resulting data in the cyberspace. As minority groups are not representative of the globe as a whole and AI and data teams are mainly composed by people from dominant groups (D’Ignazio & Klein 2020), there is a risk of excluding other identities and knowledges. That is, to restrict the inclusion of indigenous technologists, knowledge holders, language keepers and critical cultural researchers in the design and implementation of linguistic tools causes AICs and increases the unequal distribution of power. This is what D´Ignazio and Klein calls the privilege hazard.

Hence, Artificial Intelligence may know indigenous dialects but still, its language is the language of its creator. For whom are these AI systems supposed to serve? How does one care about the Other when their agency is not facilitated, their voice not heard? Are preserve and keep alive synonyms of capture? These are questions I leave open.

Inspired by discussions within decolonial thinking, I suggest that AI has merely cloaked its creators’ desire for coloniality of knowing in an AI lexicon that has been carefully selected in the name of caring for the Other. As the Puerto Rican philosopher and historian Maldonado-Torres explains, colonial machinery and power structures have evolved into more subtle manifestations that are quasi-normalized today. They are an organic part of modern individuals. Following this thread, interactions between humans and AI are a pure reflection of coloniality. As such, AICs serve as a filter to define identities, languages, and knowledge production. In other words, AICs are invisible power nodes that make us see, think, and live “coloniality all the time and every day” (Maldonado-Torres 2007, p.243). Therefore, AI terms like knowledge sharing, digital inclusion, transparency, and accountability demand removing AICs and speaking to diversality (Mignolo 2002) and pluriverse (Escobar 2017). This implies moving from universality to fluid networks of worlds of figures, words and knowledge that follow the principle of complementarity but go beyond the narrow ideal spectrum of dominant identities.

Cataract surgery helps to partially improve our diversality and pluriverse vision, but as cataracts are time-sensitive, context-specific, perspective-dependent and delimited by the ideological eye cataract of the observer, eventually new power cells will grow again forming coloniality clumps, which are better known in medicine as posterior capsular opacification. These will require further treatment, otherwise AICs will end up reasserting their existence by adopting the caring-colonizing mantra of “being in the interests of, for the good of, and promoting the welfare of” (Nayaran 1995, p. 133) humans.

Congenital cataracts are cataracts present at birth that can affect one or both eyes. Although their prevalence is low, congenital cataracts limit the baby´s sensory information about the world by reducing the clarity and visual stimulus the eye and brain receive. Therefore, the baby grows up organizing and consolidating a hierarchy of knowledges where sharp data are legitimized while blurry traces of the Other are ignored or suppressed. A clear example is that less than half of the Saami population can speak a Saami language since dominant languages like Swedish, Russian, Finnish and Norwegian are widely used — if not preferred — in media, teaching (see De samiska språken i Sverige 2020 report), and social inclusion programs (Kejonen 2017). The result is that children stop learning Saami languages as their mother tongue (see The Pite Saami Documentation Project). In a similar vein, in a moment where Generation Alpha (the demographic cohort succeeding Gen Z) is technologically connected with AI, a future where new generations are born with AICs does not seem surreal. This same futuristic scenario demands we shed light on where the Other stands in this tech-infused circle. And what kind of AICs will future generations develop? Those that give unlimited access to big data colonialism (Couldry & Mejias, 2019) or the ones that are limited to what the human eye can see but allowed to observe raw data in its natural environment?

Artificial Identity Cataracts offer a framework for thinking about how to look for answers to questions such as who defines the current AI discourse? What is the aim? How do AI terms intervene in the making of the world? And mostly who/what is counted and who/what is not? Along the way, they help us to see “another” way of addressing AI, a way where AI is curated critically to “make it more care-ful” (Martin et al 2015, p.636). This same complexity defines and threatens the fulfillment of AI terms such as transparency, accountability, knowledge sharing, and digital inclusion. Thus, if AICs are not treated early, they not only jeopardize the richness of Saami languages but also lead to a misty oblivion of identities. Indeed, the die is cast for invisible languages like Ume Saamis — one of the ten Saami languages that are spoken by a minority inside a minority (Siegl 2017). One may say that no matter how advanced artificial intelligence is, if it depends on humans, it will always be designed with a cataract interfering in its/our learning vision.

Footnotes

[1] See The Sami Parliament https://www.sametinget.se/

[2] According to its website, Divvun.org provides proofing tools, keyboards, and dictionaries for a number of indigenous and minority languages. See https://divvun.org/

[3] Giellatekno, Centre for Saami language technology at the University of Tromsø, started as a project for Saami grammatical analysis in 2001. See https://giellatekno.uit.no/, https://giellalt.uit.no/infra/WhatIsThis.html, and https://en.uit.no/forskning/forskningsgrupper/sub?p_document_id=386405&sub_id=386408

[4] See Combining Diacritical Marks https://unicode-table.com/en/blocks/combining-diacritical-marks/

[5] Extracted from https://site.uit.no/sagastallamin/files/2020/01/16_RollUp_Sagastallamin_low-res.pdf in April 2021

Bibliography

Couldry, Nick & Mejias, Ulises. (2018). Data Colonialism: Rethinking Big Data’s Relation to the Contemporary Subject. Television & New Media. 20. 10.1177/1527476418796632.

D’Ignazio, Catherine & Klein, Lauren. (2020). Data Feminism. 10.7551/mitpress/11805.001.0001.

Escobar, A. (2017). Designs for the Pluriverse: Radical Interdependence, Autonomy, and the Making of Worlds. Durham; London: Duke University Press.

Fishman, Joshua. Endangered Minority Languages: Prospects for Sociolinguistic Research. IJMS: International Journal on Multicultural Societies. 2002, vol. 4, no.2, pp. 270–275. UNESCO. ISSN 1817–4574. www.unesco.org/shs/ijms/vol4/issue2/art7

Kejonen, Olle. (2017). Dual number in the North Saami dialect of Ofoten and Sør-Troms. Uppsala Universitet.

(2021) Lägesrapport: De samiska språken i Sverige 2020. Retrieved from https://www.sametinget.se/156988 Sámediggi/Gïelejarnge Samiskt språkcentrum. ISBN: 978–91–986635–2–5.

Mahmoodi-Shahrebabaki, Masoud. (2018). Language and Identity: A Critique. Journal of Narrative and Language Studies.

Martin, Aryn & Myers, Natasha & Viseu, Ana. (2015). The politics of care in technoscience. Social Studies of Science. 45. 10.1177/0306312715602073.

Mignolo, Walter. (2002). The Geopolitics of Knowledge and the Colonial Difference. South Atlantic Quarterly. 101. 57–96. 10.1215/00382876–101–1–57.

Nelson Maldonado-Torres (2007) On The Coloniality of Being, Cultural Studies, 21:2–3, 240–270, DOI: 10.1080/09502380601162548.

Puig de la Bellacasa, Maria. (2011). Matters of Care in Technoscience: Assembling Neglected Things. Social studies of science. 41. 85–106. 10.2307/40997116.

Ruha, Benjamin. (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity Press: Medford, MA.

Saramago, José. Ensaio sobre a cegueira. (2008) .45ª imp. São Paulo: Companhia das Letras.

Sjur N. Moshagen, Trond Trosterud, Lene Antonsen and Børre Gaup. (2019). Sami Language Technology for Indigenous Languages: Achievements and Challenges. University of Tromsø — Divvun and Giellatekno.

Sjur Nørstebø Moshagen, Trond Trosterud. (2019). Rich Morphology, No Corpus — And We Still Made It. The Sámi Experience. Proceedings of the Language Technologies for All (LT4All) , pages 379–383 Paris, UNESCO Headquarters.

Siegl, Florian. (2017). Ume Saami — The Forgotten Language. Études finno-ougriennes. 10.4000/efo.7106.

(2011) The Pite Saami Documentation Project. Impressions of an endangered Saami language community. The Hans Rausing Endangered Language Project.

Wiechetek, L., & Moshagen, S. (2019). Many shades of grammar checking — Launching a Constraint Grammar tool for North Sámi.