Building more equitable AI through comprehensive stereotyping evaluation
At StereoWipe, we're addressing a critical challenge in Generative AI: evaluating and mitigating stereotyping across languages and cultures. While significant progress has been made in English-based systems, there's a pressing need to extend these efforts globally.
Our research focuses on the challenges of evaluating bias in a global context, developing new methods for subjective cultural assessments. Our benchmark is designed to measure stereotyping in language models across multiple dimensions including gender, race, religion, nationality, profession, and more.
By creating tailored evaluation tools, we aim to provide accurate assessments of bias in AI systems, encourage development of fairer technologies, and promote cross-cultural understanding of stereotyping in language models.
Our work is grounded in a commitment to open and collaborative research. We believe that the best way to address the complex challenge of stereotyping in AI is to work with a diverse community of researchers, developers, and domain experts.
We provide detailed documentation for our benchmarks and tools, making our code and methodology available to the public.
We use state-of-the-art methods including LLM-as-a-Judge and human annotation to ensure reliable and robust findings.
We work with a global community to ensure our benchmarks are culturally sensitive and relevant across contexts.
Our flagship product is the StereoWipe Leaderboard—a public-facing accountability tool that benchmarks leading AI models on stereotyping and cultural norms. Updated weekly, it evaluates 40+ models across 10 bias categories with region-specific assessments.
The leaderboard tracks both explicit stereotypes (direct statements) and implicit stereotypes (subtle assumptions), providing a comprehensive view of model behavior across diverse cultural contexts.
StereoWipe is an open research initiative. We welcome contributions from researchers, developers, and community members interested in making AI more equitable.