Getting the data to include everyone

Silvana Fumega, PhD
Reseach and Policy Director
ILDA

We often speak of data as an input for algorithms, but it is also the product of standards and legal frameworks that shape their production. On the one hand, something that sounds as technical as developing a data standard is an exercise that allows us to rethink the production of data and its use, as well as the existing problems with biases in its creation. On the other hand, algorithms learn from data and build their representation of reality from it. Therefore, data production is also at the center of all discussions about what is called artificial intelligence.

Without the right categories, the right data cannot be collected and, increasingly, without the right data, it becomes difficult to create policies that offer solutions to different groups and people and even more general social changes. Thus, there is a need for accountability and inclusiveness when it comes to data production.

Take for instance our work on femicides at ILDA. If there is no clear data on certain aspects that can characterize a femicide, we could be not accounting for them therefore neglecting to consider these victims when designing and implementing policies and other measures. Furthermore, when exploring data on violence against the LGBTQ + community in Central America, it is evident that, without data on gender, sexual orientation and other variables, it is very difficult to understand how serious the problem is becoming in that region (or in any other, for that matter).

However, there is a caveat: although making these populations and their problems visible through data can help governments and civil society actors address their needs, doing so also poses a risk to vulnerable populations, as it could accelerate trends towards discrimination and exclusion (ILDA, Data for development the road ahead, p. 19).

Default perspective

In this context, we must pay close attention, in both cases, to whose perspective is supposed to be the default one (D’Ignazio y Klein, Data feminism, chapter 1). Almost always, the perspective that is taken into account is from those who occupy the most privileged positions in the field (people or companies that may also have incentives to exploit data and develop algorithms), as they do in our society in general. This privilege makes some populations invisible in data sets, algorithms, and visualizations, to name a few examples.

What do we mean by biases?

Power and data are related. Societal and power structures create biases, an inference based on a prejudice or a preconceived idea due to a specific worldview. Biases are multi-layered and they manifest in different ways. We carry our own biases and experience them in our daily lives. They can be related to gender, race, age and class, among others, and can result in different types of discrimination. (See: Brandusescu, Canares, Fumega. ¿Estándares de Datos Abiertos a puerta cerrada?) As mentioned, these biases are included in the processes by which data is produced and, in many cases, in the standards that guide that production.

What do we mean by prejudices? There are different meanings, depending on the context. The examples include cognitive biases (a particular characteristic of a subject, which affects the way they perceive reality) or, when talking about AI, algorithmic biases, which are systematic and repeated errors that create unfair results, such as arbitrarily granting privileges to a group of users, over others. These biases could be identified at different stages of the data production processes, from problem definition to data collection, preparation, and finally when data is used and you realize that something is off. (sometimes a little late).

A long way ahead

The above discussion may seem quite technical, but it certainly has an impact on all of our lives, especially for the most disadvantaged parts of the population. This is particularly important when data and algorithms are part of decision-making processes that affect people, since, for example, a statistical pattern that applies to the majority may not be valid for a minority group. This is one of the main reasons we discuss and analyze this type of data and processes.

At the end of the day, these kinds of concerns are important because these biases affect people's lives, when making decisions (our own choices as consumers) or when they are the subject of the decision-making process of others. From consuming information about politics, to the benefits to which we are entitled, to not receiving certain opportunities just because we belong to a certain demographic group, we, as a society, must be aware of the implications of the data we produce and consume. We are still learning to deal with prejudices and mitigate them. There is a long way to go, but the first step is to become more aware of these dangers and implications.