Skip to content

How to Meaningfully Visualize Imperfect Race Data

09.13.24

By: Alexandra Baker Research & Data Analyst II and Maria T. Khan Research & Data Analyst II

Data plays an increasingly vital role in all kinds of decision-making, from where we eat lunch to how much funding our school district receives. As data analysts and users, we all have a responsibility to root our work in the most authentic stories represented by data. Paying close attention to the limitations of race data is crucial. The largest data collection agency in the United States, the Census Bureau, has a history of racial politics and xenophobia

In fact, it can be incredibly dangerous to assume race data and categorization include everyone’s experiences. How we disaggregate, or analyze, imperfect race data affects the stories we tell and can either hide or reveal nuances within communities of color. After analyzing data, how we visualize and tell stories can help—or harm—entire communities.    

When we at Catalyst California visualize data by race, we aim to highlight the impact of systemic racism and increase visibility for the most underrepresented groups. In this blog, we’ll outline how we work with community partners to center their stories within the data. We also discuss how thoughtful design makes our data storytelling more effective. Our newly released Data Storytelling Guide dives into more detail on how to be the microphone for the stories your data tells, but not the voice of the people it represents. 

How We Center Affected Communities in Our Visuals and Analysis 

The first step to intentional data visualization is to understand the communities represented by the data. Community members have first-hand experience and can add insight on what the data does and does not capture.   

When analyzing data, we incorporate the stories of community members, acknowledging them as experts, to make sense of the data we see. In some cases, we have found that the issue we set out to research was completely different from what community members saw on the ground. 

For example, during the interviews for a report about children in the Antelope Valley, we learned that, contrary to popular belief, the rise in telehealth worsened access to mental health care for youth. With options for remote work, providers in the Antelope Valley could not get mental health professionals to move to the region to provide important in person care to youth. In other words, the community context gave us a new frame into the issue. The goal is to not tell the story you think you know, it’s to tell the story shared by people the data represents. 

When thinking about the language of our visualizations, we use terms preferred by the community, because race and identity terms depend heavily on cultural factors and even differ across generations. For example, we ask our community partners for their preferred gender-neutral terms—Latine, Latinx, Latine/x/o/a. And whereas youth now might prefer the term Mexican American, members of the farmworker movement identified more strongly with the term Chicano. Similarly, after interviewing community organizations, we decided to use the more representative term “SWANA” (Southwest Asian North African) over the common but problematic term “MENA” (Middle Eastern or North African). That said, we default to using language that is as inclusive as possible to reflect the communities we work with. 

The community-driven approach also acknowledges communities left out of the data, including their stories whenever possible through additional interviews, callouts of community organizations, and use of proxies when exact data is unavailable. Existing data is often imperfect, but acknowledging and supplementing these gaps is crucial to our ongoing goal of telling a better, more complete data story, and promoting racial equity. 

Engaging with community partners for one of our data-driven projects.

How to Craft Thoughtful Design and Present Empathetic Findings 

How we visualize data can amplify lived experiences. Data points don’t tell stories, we do. One of the core principles in our guide is creating the visual design and language with intentionality and a commitment to uplifting the experience, not just the data outputs. 

You can do this by using findings-based text in data visuals, putting a focus on the community’s narrative of its experiences and their causes. We want to ensure our language highlights systemic issues that perpetuate injustices. For example, a graph with a neutral name like Traffic Violation Tickets by Race tells a more accurate story when titled Law Enforcement Officers Pull Over Black Drivers More Than Any Other Racial Group. Both titles describe the same data, but the first omits key context and findings.   

Additionally, thoughtful data visualization considers how and when to compare racial groups. We want to compare groups’ experiences but not blame them for the systemic disparities they experience. Every project has a story to tell; in some instances, it is more important to highlight the disproportionate disparity one group is experiencing, while in other cases, highlighting the worst outcomes for several groups tells the more accurate story. 

In either scenario, it is most important to foster empathy and multiracial solidarity across communities. We want to get people to care when it’s not their group that is most affected, and we encourage the data community to always pause and consider if the context requires solidarity behind one group experiencing the worst injustice or solidarity behind multiple groups experiencing similar injustices. 

The same data points can have different impacts depending on our visualization, design, and narrative choices. If we use a systems-focused findings approach with intention, our data visuals can foster empathy, call attention to the most vulnerable populations, and encourage equitable decision-making. 

Description: Findings-based title in graph with community-determined labels. 
Source: “End Gang Profiling in Southeast San Diego,” Mulholland Graves, Smith, Zhang, Segovia, Khan. Catalyst California. March 25, 2024.  

While we cannot always control the rigor of the data we use, we can control how we use it and the stories we narrate. When visualizing data by race, here are some of our guiding principles from the Data Storytelling Guide:  

  1. Uplift, do not harm, impacted communities  
  2. Prioritize community ownership and power ​ 
  3. Emphasize systemic problems and solutions ​ 
  4. Make data actionable and accessible ​  
  5. Critically assess data bias and limitations   

Numbers are meaningless if they don’t invite change by amplifying the voices of those they represent. A thriving California uplifts and celebrates all communities. To make it possible we all need to practice equitable data storytelling. This is the most important step to create bodies of work that benefit the lives of all Californians. 

Check out Data Storytelling Guide for more best practices and read the other blogs in our series on race and data: 

Why We Must All Be Advocates When it Comes to Race Data, Elycia Mulholland Graves, August 14, 2024.  

What is Data Erasure and Why It's Important, Jennifer Zhang, August 28, 2024.  

Thank you to Tessie Borden, our Senior Communications Manager and to Chris Ringewald, Jennifer Zhang, and Elycia Mulholland Graves from our Research and Data Analysis Team for their contributions.