Illustration by Somnath Bhatt
Transgender erasure in AI: Binary gender data redefining ‘gender’ in data systems
A guest post by Brindaalakshmi. K. Brindaa is an intersectional queer-feminist writer, researcher and advocacy professional. They work at the intersection of human rights, identities, and technology. Their research and advocacy efforts are informed by their work as a queer and trans rights activist and peer supporter working with the LGBTIQA+ community in India.
This essay is part of our ongoing “AI Lexicon” project, a call for contributions to generate alternate narratives, positionalities, and understandings to the better known and widely circulated ways of talking about AI.
Recent conversations around gender and AI have centred around the need to understand gender beyond the binary of male and female. For example, facial recognition technology used by Uber in the US has problems with correctly recognising transgender persons (see here and here). Yet Uber is no exception. The U.S. National Science Foundation, for example, has highlighted research that shows that “facial analysis services performed consistently worse on transgender individuals, and were universally unable to classify non-binary genders.”
According to CNN Business,¹ “The way a computer sees gender isn’t always the same way people see it. A growing number of terms for describing one’s gender are becoming common in everyday life.” The article cites a lack of sufficient training data to recognise transgender persons, and mentions several recent efforts by private companies and the state to include a third gender category including the recent addition of a third gender category on driver’s license in several states in the US. “As these societal changes proliferate, AI-driven conclusions have become more than a gender identity concern.”
Indeed, in 4 Ways to Address Gender Bias in AI,² Josh Feast notes that incomplete or ‘skewed training datasets’ are created when demographic categories are missing from the training data: “Models developed with this data can then fail to scale properly when applied to new data containing those missing categories.” And many models fail to include data on transgender persons.
While non-binary gender identities have been present in global south countries for some time, the struggles of gender minorities persist. With respect to Gender and AI, these struggles begin with the process of recognising one’s gender identity through data. Presently, gender data from the global south remains incomplete and invisible — It gets lost in the local activism to expand the understanding of gender rights beyond the binary of male and female in an attempt to include centuries’ old indigenous realities; while also erasing and ignoring their hard-won (indigenous) gender recognition simply to remain relevant and be included within the western development paradigm. The definition for the broad western category of LGBTQ+ subsumes, and often even erases the experiences of transgender persons of indigenous identities from a global south country with colonial history like India.
With this as the context, this essay focuses on gender data and the severely lacking frameworks for guaranteeing transgender rights. Global conversations around gender gap only account for the gap between men and women. Transgender persons and their continued invisibilisation due to the lack of data on their (non-western) realities continues to keep them in a data vacuum.
I specifically explore transgender data rights in the context of India and its recent (lacking) efforts to create more inclusive data framework. I argue that that we should account for the historical marginalisation faced by this population due to the pathologisation and criminalisation of their gender identity that began during British colonisation in India. Gender data currently accounts only for individuals who identify their gender within the binary genders of male and female, which leads to continued exclusion of transgender persons who identify beyond the binary. I also delve into the unique situation in India (and many other South Asian countries) — the recognition and challenges of a third gender category, as well as the challenges faced by transgender persons who identify within the binary genders. This essay pushes for data systems to reimagine the understanding of gender data to account for individuals beyond the binary — one that is based on an individual’s right to self-identify their gender without the need for a medical certification or complicated bureaucracy.
Pathologisation: selective agency to enter data systems
In most countries, individuals are required to provide proof of name, address, and gender (assigned at birth) in order to access welfare benefits. Yet these documents, nor the systems they interact with, rarely acknowledge transgender rights and many transgender people are left without access to systems. Further, in many countries, it’s difficult or even illegal for transgender persons to change gender.
According to the third edition of ILGA’s Trans Legal Mapping Report (a global report on the status of legal gender recognition of transgender persons),³ only 96 countries have legal processes to allow transgender persons to change gender, with only 25 countries described as not having prohibitive requirements. Even with the declassification of gender dysphoria as a mental disorder, a demand for medical certification in some form to change identification documents continues across many countries. The need for a medical validation and authorization — a form of continued pathologization — impacts the understanding of individual agency and an individual’s ability to self-identify their gender identity in different identification documents and enter data systems as a valid human being to access their rights. This continues to alienate transgender experiences from the common collective understanding of gender and as a data category, and consequently, from gender data. With increased push to digitise identification processes to enable interoperability of data systems, the inability of an individual to enter a data set due to the lack of a valid ID or data mismatch also impacts the size of the gender datasets that may be used to train automated decision-making systems using Artificial Intelligence (AI) and machine learning.
Decolonising the understanding of (trans) gender in data systems
In addition to pathologisation, transgender experiences from the global south are often misrepresented due to the superimposition of a western understanding of being transgender, which often remains within the binary gender framing — trans woman and trans man. Additionally, while there are many global development data efforts to bridge the gender gap — including SDG 5, Human Development Report 2019,⁴ and Data2X⁵ — it appears that transgender identities are presently only included within the umbrella of sexual orientation and gender identity (SOGI), and not considered under gender while assessing gender gap.
However, in the context of India, understanding transgender identities and gender data can be complex — one that doesn’t fit the homogenised understanding of being transgender or LGBTIQ+. The imagination of global development data efforts leave no room for the indigenous transgender identities beyond the binary genders — unique to India. India has a strong presence of indigenous transgender identities such as hijra, kinnar, jogappas, among others. British colonial laws have had a significant impact on the rights of transgender persons in the country. Transgender persons were subject to state surveillance using the Criminal Tribes Act of 1871 which called for “surveillance and control of certain criminal tribes and eunuchs.”⁶ It is important to take this history of suspicion and surveillance into account while understanding gendered experiences specific to transgender persons in the region and its continued impact in the present realities of transgender persons. This surveillance persists even today with the indiscriminate demand for a medical certificate only from transgender persons (to identify within the binary genders). Cisgendered persons are not subject to such unreasonable requirements. Using the colonial ideas of pathologisation and criminal suspicion, the state continues to question the human-ness of transgender persons to procure a valid identification document as a human being. By way of such vile practices, the state continues to control certain bodies and the sufficiency for them to even exist as a person, let alone enter a data system, training data sets, and the imagination of AI.
In the section below, I build on work I did for the study ‘Gendering of Development Data in India: Beyond the Binary’,⁷ using India as a case study to trace the structural and systemic challenges with transgender data and the resultant rights exclusions faced by transgender persons identifying within and outside the gender binary. While the insights drawn below are from India, they are largely representative of the current state of global approaches to development data, understandings of gender, and consequently, gender equality.
Status of genders beyond the binary in the India’s development data
Building on Feast’s idea of missing data and data categories, a central issue with AI systems is that they are trained to make decisions based on large scale data sets that contain a limited view of individuals. Training AI systems about the understanding of transgender persons would require such data sets on transgender persons. In India, the data sets available on transgender persons (e.g., the 2011 census data)⁸ are skewed and insufficient to automate decision making systems that will be inclusive of transgender realities. This limited data has resulted in insufficient budgeting and lack of programmes in several Indian states. Understanding data exclusion requires a closer look at the law, policy, and practice in India. Historically, transgender persons have faced severe discriminations and marginalisation in India.
- Enumeration of transgender persons: Prior to 2011, the Indian census did not recognize transgender persons. Census 2011 was the first census to enumerate transgender persons in India, yet under the gender category of ‘Others.’ This general classification has left it unclear who fits into this category, and has resulted in several issues with the data collected (e.g., undercounting). This flawed dataset is being used as the primary data for fund allocation across different states for transgender people’s inclusion, leading to misallocation and under-allocation of funds for development priorities addressing the needs of transgender persons across India.
- Legal framework: Subsequently, in 2014, the Supreme Court of India for the first time recognised the right of an individual to self-identify their gender as male, female, or transgender.⁹ This verdict also detailed nine directives to be implemented for the inclusion of transgender persons in the country. However, these directives were not uniformly implemented across the country. Further, in 2019, despite severe opposition¹⁰ from the transgender community, the Transgender Persons (Protection of Rights) Act 2019¹¹ was passed — supposedly to protect the rights of transgender persons. In reality, this new law demands a medical certificate for transgender persons to identify within the binary.¹² This disproportionately affects the ability of transgender individuals to disclose their gender identity.
- Policy: Even with efforts to institutionalize transgender rights into law, there is often a large gap between what the laws say and how — if at all — they are implemented. Based on the census data, some states like Meghalaya cited the lack of transgender persons for the lack of a policy.
Another repeated issue is the move to digital welfare services alongside a lack of digital literacy and low literacy among transgender persons owing to a high rate of school dropouts.¹³ There are no provisions for transgender persons with no internet access, smartphone, or digital literacy. It is either an offline process facing the prejudices of government official or an online process with no support. The disproportionate emphasis on data shifts the burden of proof onto the transgender individuals even under dire circumstances for the sake of inclusion. This shift also reflects a change in priority from human rights onto data.
- Practice: Transgender persons face marginalisation in accessing their basic rights including but not limited to healthcare, housing, banking, education, employment, among others. Given the stigma around being transgender in India and the discriminatory nature of the new law, the mandatory requirement for any individual to first identify as transgender as their gender on their ID before changing to a binary gender¹⁴ will further discourage individuals from disclosing their identity. In essence, the new law will worsen the situation for transgender persons in the country, their enumeration, and access to constitutionally guaranteed human rights and the size of datasets that are fed into automated decision-making systems.
Even with India’s attempts to include transgender persons, the use of skewed training datasets are evidently creating issues of scaling due to significant data mismatches. Until 2017, the application to apply for a Permanent Account Number (PAN) did not have a third gender. However, transgender persons with a digital ID/Aadhaar as transgender were expected to link it with their PAN. The system did not allow for this linking due to a gender category mismatch. This was resolved only after the Supreme Court of India ordered for the inclusion of an additional gender category.¹⁵ Evidently, the welfare system in India has not been structured or equipped to be inclusive of transgender persons resulting in ‘skewed training datasets’ to be fed in Automated Decision Making Systems (ADMS). The Government of India recently admitted to having no plans to pass a law¹⁶ or regulate the use of Artificial Intelligence in India. As noted by Feast, this continued ‘skewed training datasets’ will lead to scaling challenges — in the case of a welfare state, this translates into implementation challenges. The use of skewed datasets in an unregulated environment — no law to regulate AI neither to protect an individual’s privacy — is likely to worsen the status of transgender persons in the country.
Conclusion: Need to reimagine gender as a data category
Presently, the onus continues to be on the marginalised to contend with systems and initiatives that fail to identify people beyond the binary genders.¹⁷ For example, the explicit mention of women and girls in the framing of SDG 5 encodes gender inequality in leading development frameworks, making it harder to hold states accountable. With the advent of AI-based technology systems and big data for decision-making, transgender people face amplified challenges of pathologization and criminalisation. Considering the nature of AI and machine learning to generate algorithms based on voluminous data related to specific keywords, the reframing of the word gender becomes critical. The understanding of gender needs to be expanded to include gender-diverse individuals without explicit mentions of specific identities. More importantly, these frameworks need to include transgender realities from the global south. The reframing needs to be based on human rights, moving away from the need for representation, which often tends to demand data.
Simultaneously, gender data use should be limited and re-evaluated based on each specific use case. As a respondent of the study on India¹⁸ notes, “Gender data itself has to be seen in the context of what we are looking at. What is the data that we are collecting and how are we using it? For some other things, gender may not be significant at all.” A Human Rights Based-Approach to Data released by the UN OHCR highlights the need to allow self-identification especially on gender identity¹⁹ due to the extreme discriminations that transgender individuals face. Given the complex challenges faced by transgender persons, disconnected offline data systems made it easier for individuals to access their rights.
Sexual orientation and gender identity are two distinct data categories and both these data sets should be dealt with differently.²⁰ There’s an urgent need to explicitly separate these two data categories in the context of inclusion of gender-diverse people, including but not limited to those who identify beyond the gender binary of female and male in an effort to better redefine gender and gender equality.
As with facial recognition technology, it is important for the global north to continue to demand better systems that recognise transgender persons. But these conversations need to go beyond the age-old narrative that is representative only of the developed world, to also include indigenous non-binary gender realities from the global south. These communities continue to face far-greater struggles due to the historical marginalisation thrust upon them by the western concepts of colonisation and pathologisation. Without the conscious effort to redefine (trans)gender understanding from a global north imagination, gender development in the global south will continue to force-fit itself into a western imagination to fulfil an unrealistic development (funding) agenda far from its everyday reality.
 Metz.R (2019, November 21). AI software defines people as male or female. That’s a problem. Retrieved from https://edition.cnn.com/2019/11/21/tech/ai-gender-recognition-problem/index.html
 Feast, J. (2019, November 20). 4 Ways to Address Gender Bias in AI. Harvard Business Review. Retrieved from https://hbr.org/2019/11/4-ways-to-address-gender-bias-in-ai
 Wareham, J (2020, September 30). New Report Shows Where It’s Illegal To Be Transgender In 2020. Forbes. Retrieved from https://www.forbes.com/sites/jamiewareham/2020/09/30/this-is-where-its-illegal-to-be-transgender-in-2020/?sh=30523b0a5748
 UNDP. (2019). Human Development Report 2019. UNDP. Retrieved from http://hdr.undp.org/sites/default/files/hdr2019.pdf
 Grantham, K. (2020, March). Mapping Gender Data Gaps: An SDG Era Update — Executive Summary. Data2X. Retrieved fromhttps://data2x.org/wp-content/uploads/2020/03/MappingGenderDataGaps_ExecSummary.pdf
 Eunuch is a derogatory term used to describe transgender persons assigned male at birth.
 Gendering of Development Data in India: Beyond the Binary is a qualitative study authored by Brindaalakshmi. K for the Centre for Internet & Society, India as part of the Big Data for Development Network, supported by IDRC, Canada https://cis-india.org/raw/brindaalakshmi-k-gendering-development-data-india
 Brindaalakshmi. K (2020). Gendering of Development Data in India: Beyond the Binary #3 Identity Documents and Access to Welfare. Centre for Internet & Society, India Retrieved from https://cis-india.org/raw/brindaalakshmi-k-gendering-development-data-india
 Supreme Court of India. (2014). NALSA Vs Union of India, 2014. https://indiankanoon.org/doc/193543132/
 Lalwani, V. (2019, November 27). What next for transgender people, as India clears a bill that activists call “murder of gender justice”? Quartz India. https://qz.com/india/1756897/indias-transgender-rights-bill-disappoints-the-lgbtq-community/
 Transgender Persons (Protection of Rights) Act 2019. Retrieved from https://prsindia.org/files/bills_acts/bills_parliament/The%20Transgender%20Persons%20(Protection%20of%20Rights)%20Act,%202019.pdf
 According to Section 7 of the Transgender Persons (Protection of Rights) Act 2019 individuals are required to undergo sexual reassignment surgery (SRS) and procure a medical certificate to be able to change their gender within the binary of male and female. Although the final rules of the law has broadened the understanding of a medical certificate to including a certificate from therapist, the insistence of medical validation continues to violate an individual’s agency.
 The rules of the Transgender Persons (Protection of Rights) Act 2019 includes an online option to change an individual’s identity document.
 The rules of the Transgender Persons (Protection of Rights) Act 2019 details out a two step process for individuals to change their name and gender on their identification documents. The process is as follows:
- Step 1: On verifying this application, the District Magistrate will issue a certificate of identity and a transgender identity card to the applicant. State governments are excepted to have a register for the issuance of certificate of identity and the transgender identity card. Based on this transgender identity card, the said individual can change their name and gender to transgender on all other identification documents including Aadhaar.
- Step 2: In order to change one’s gender within the binary of male or female, individuals are expected to go through another application with a certificate from a medical practitioner.
 Express News Service (2018, March 29). Supreme Court steps in as transgenders face trouble with Aadhaar-PAN linking. The New Indian Express. Retrieved from https://www.newindianexpress.com/nation/2018/mar/29/supreme-court-steps-in-as-transgenders-face-trouble-with-aadhaar-pan-linking-1794028.html
 Partliament of India, Lokh Sabha website (2021, December 08), Retrieved from http://loksabhaph.nic.in/Questions/QResult15.aspx?qref=30045&lsno=17
 Scolaro. B. (2020, June 24). LGBTI and the Sustainable Development Goals: Fostering Economic Well-Being. Harvard Business Review. Retrieved from https://lgbtq.hkspublications.org/2020/06/24/lgbti-and-the-sustainable-development-goals-fostering-economic-well-being/
 See footnote 10.
 United Nations Human Rights Office of the High Commissioner (2018). Guidance Notes on Approach to Data. OHRC. Retrieved from https://www.ohchr.org/Documents/Issues/HRIndicators/GuidanceNoteonApproachtoData.pdf
 LGBTFunders. (2019). Data Collection. Retrieved from https://lgbtfunders.org/resources/best-practices-for-foundations-on-collecting-data-on-sexual-orientation-and-gender-identity/