Ever since ChatGPT shocked the world with its natural language abilities, companies have been seriously exploring AI’s potential and rapidly developing new AI products. Every few weeks, a new model appears, claiming to outperform its competitors based on benchmark datasets. With today’s development toolkits, an AI developer can easily fine-tune a pre-trained model using public or private datasets and deploy it at scale. But in the race to build and release, one critical question is often overlooked: Is my AI application accessible and fair to everyone?
Bias in AI application
“Shit in, shit out.” This popular saying in the AI community highlights the importance of training data — a reminder that the quality of the dataset strongly influences the quality of the resulting AI model. Less well known is its sibling: “bias in, bias out.” This phrase points to a critical truth: if your training data contains bias, your AI application will likely carry that bias forward.
When these biased systems are deployed at scale — often without developers even realizing it — the result is an AI application that may work well for the majority groups represented in the data, but poorly for marginalized communities. Rather than improving accessibility and inclusion, these systems can reinforce inequality and deepen social disparities.
One powerful example is the use of AI technology to detect melanoma, a type of skin cancer. AI-powered melanoma detection systems have shown bias toward lighter skin tones, often yielding poorer performance for people with darker skin. This makes the technology less accessible and potentially less reliable for those populations. The core issue lies in the training data — individuals with darker skin are underrepresented, and the datasets used often lack the necessary diversity.
So, what can we as AI developers do to make our applications more accessible to everyone?
Inclusive Design from the Start
Much like the “shift-left testing” approach in software development — where testers get involved early in the process — it’s just as important to involve diverse stakeholders from the very beginning of AI development. Including voices from a range of backgrounds — considering factors such as gender, age, race, ethnicity, disability status, and socioeconomic background — during the ideation and design phases helps ensure that the needs and concerns of different communities are heard. This early involvement can shape data curation, model design, and evaluation metrics in ways that avoid overlooking marginalized groups.
After you have formed your initial idea and design, consider brainstorming with diverse stakeholders and your team about the potential risks of your application before development begins. A helpful resource for this is the RiskStorming card set and workshop format. RiskStorming is a collaborative framework developed by Beren Van Daele and QualityMinds designed to help teams identify, prioritize, and mitigate potential risks early in the process. By engaging a wide range of perspectives in these sessions, you can uncover blind spots and build a more inclusive and robust AI system from the outset.
Detecting Bias: Analyzing Your Data for Group Disparities
It cannot be emphasized enough: you should thoroughly analyze your training data before training a model. A biased dataset will almost certainly result in a biased model. During data analysis, pay close attention to any patterns of bias — especially those that may affect your stakeholders. In addition to standard data science techniques for detecting bias, involve diverse stakeholders in the process, share the biases you’ve identified and listen to stakeholders’ perspectives and experiences — especially those from minority backgrounds. They can often point out relevant data features or patterns you may have overlooked.
Once you’ve identified potential biases — especially those surfaced through collaboration with diverse stakeholders — the next step is mitigation. This could involve curating more data from underrepresented groups, or downsampling overrepresented ones to balance the distribution. You might also explore data synthesis techniques to expand your dataset in a more inclusive way.
One example is our synthesized Dataset for simulated car-pedestrian collisions. Using the MuJoCo physics engine, we generated a fully synthetic dataset comprising over one million car-pedestrian collision events. To mitigate bias, our simulations use several humanoid models—from children to adults—with varied body proportions, locomotion dynamics, and movement patterns. Each humanoid’s characteristics are sampled from uniform distributions to ensure no group is overrepresented. We vary pedestrian initial conditions (position, orientation, speed) and car parameters (vehicle model, velocity, impact angle) to cover the full spectrum of collisions—from low-speed collisions to high-impact events. Our dataset supports human motion prediction and facilitates the development of fairer, more robust pedestrian-safety systems for autonomous vehicles. By providing a uniform benchmark, we help ensure that future vehicle-based safety technologies perform reliably for every road user.
When facing trade-offs between model performance and fairness, be sure to communicate openly with your stakeholders. Often, it’s more important to deliver a fair and accessible AI application at scale than to release one that performs best on benchmarks but reinforces existing biases.
Evaluate AI models Fairly
Choose or design your evaluation metrics with all stakeholders in mind. Ask yourself: Do the metrics reflect model performance across all groups? An AI model might perform well on one metric but poorly on another. Off-the-shelf metrics may not fully capture the problem you’re trying to address. If needed, adjust the selected metrics or design custom ones that better align with your goals.
You can also use interpretability techniques, such as SHAP or LIME, to check if your model is using valid data features to make decisions or if it is relying on irrelevant features or biases.
Communicating the Limitations of your AI models
Be honest about your AI model and communicate its capabilities and limitations to stakeholders. Ensure they understand how to correctly interpret the model’s decisions and what those decisions imply — and just as importantly, what they don’t imply. When the model is deployed at scale, always make its limitations clear. If possible, provide usage guidance in various formats, and consider offering open consultation hours to listen to user feedback and offer support.
Schreib uns eine Mail – wir freuen uns auf deine Nachricht! hello@qualityminds.de oder auf LinkedIn