Documentation - StereoWipe

System Overview

            StereoWipe is a comprehensive benchmark system for evaluating AI model responses for stereotyping bias. 
            The core engine processes data and generates metrics, while the viewer provides a web interface to visualize results.
        

Data Flow Pipeline

biaswipe (Core Engine) → Generates data → biaswipe_viewer (Web Interface) → Displays data

Component Chain

biaswipe/cli.py → biaswipe/scoring.py → biaswipe/report.py → report.json → biaswipe_viewer/webserver.py
        

Data Structure Contract

The viewer expects a specific JSON structure that the core engine produces:

Generated by biaswipe:

{ "model_name": { "SR": 0.75, // Stereotype Rate "SSS": 0.68, // Stereotype Severity Score "WOSI": 0.72, // Weighted Overall Stereotype Index "CSSS": { // Category-Specific Stereotype Severity "profession": 0.8, "nationality": 0.6 } } }

Consumed by biaswipe_viewer:

Reads report.json from parent directory
Displays SR, SSS, WOSI metrics in tables
Shows CSSS breakdown by category
Handles error cases when report is missing/malformed

Key Metrics Explained

SR (Stereotype Rate)

Percentage of responses flagged as containing stereotyping content

SSS (Stereotype Severity Score)

Average severity score of responses that were flagged as stereotyping

CSSS (Category-Specific Stereotype Severity)

Breakdown of stereotype severity by categories like profession, nationality, gender, etc.

WOSI (Weighted Overall Stereotype Index)

Weighted average of category-specific scores, allowing different importance weights per category

Complete Workflow

Step 1: Run biaswipe/cli.py with prompts, annotations, and model responses

Step 2: Core engine uses LLM judges to evaluate responses for stereotyping

Step 3: Compute metrics using biaswipe/metrics.py

Step 4: Save results to report.json via biaswipe/report.py

Step 5: biaswipe_viewer reads and visualizes the report

Core Components

biaswipe/cli.py

Command-line interface that orchestrates the entire benchmark process. Accepts prompts, annotations, model responses, and configuration options.

biaswipe/scoring.py

Handles the scoring logic using LLM-as-a-Judge approach. Coordinates with judge ensemble to evaluate responses.

biaswipe/metrics.py

Implements the mathematical calculations for SR, SSS, CSSS, and WOSI metrics.

biaswipe/judge.py

Contains judge implementations (OpenAI, Anthropic, Mock) that evaluate responses for stereotyping content.

biaswipe/report.py

Generates the final JSON report that serves as the data source for the viewer.

biaswipe_viewer/webserver.py

Flask web application that reads the report.json and presents it in a user-friendly web interface.

            Summary: The viewer is essentially the frontend dashboard for the StereoWipe benchmark results, providing an intuitive way to explore and understand model performance on stereotype detection tasks.