A pseudonymous developer has launched a new tool that puts AI models under the microscope.
Known online as “xlr8harder,” the developer created SpeechMap, a platform designed to evaluate how different chatbots respond to sensitive or controversial topics.
SpeechMap focuses on leading AI models, including OpenAI’s ChatGPT and xAI’s Grok. Its mission is simple: make chatbot behavior more transparent.
According to the developer, the tool lets users explore how AI handles issues like civil rights, political criticism, and protest-related queries.
The AI Bias Issue
This release comes at a critical moment. Some U.S. political leaders, especially those aligned with former President Donald Trump, have raised concerns about AI “bias.”
Key voices like Elon Musk and investor David Sacks argue that major models lean too far left and suppress conservative opinions.
Though companies like OpenAI haven’t responded directly to the criticism, they have started making adjustments.
OpenAI, for instance, recently stated that future models will aim to present multiple viewpoints rather than push a single stance.
Meta also made similar claims about its Llama models. The company says its latest updates aim to avoid endorsing “some views over others,” especially on heated political topics.
How SpeechMap Works
SpeechMap uses AI to judge other AI models. It runs each chatbot through a series of test prompts across topics like politics, history, civil protests, national identity, and free speech.
Each response falls into one of three categories:
- Compliant: The model gives a full, direct answer
- Evasive: The model gives a vague or hedging response
- Refusal: The model declines to answer entirely
This structure gives users a clear way to compare performance. SpeechMap then tracks results over time, highlighting any shifts in how models behave.
AI Chatbots: Transparency or Bias?
The creator, xlr8harder, acknowledges that the test isn’t perfect. AI models can misinterpret prompts.
The judge model itself might introduce bias. Despite this, the developer believes SpeechMap fills an important gap.
“These discussions shouldn’t only happen behind closed doors,” they said in an email to TechCrunch. “That’s why I built the tool, to let the public explore this data too.”
Key Trends From the Results
SpeechMap’s early findings are eye-opening. One major takeaway: OpenAI’s newer models refuse more political prompts than before.
According to the data:
- Older models answered political questions more freely.
- Newer models, like GPT-4.1, tend to decline such prompts.
- Meta’s Llama models show similar cautious behavior.
The Most Permissive Model
In contrast, Grok 3, developed by Elon Musk’s startup xAI, appears more open. It answered 96.2% of all test prompts on SpeechMap. That’s far above the average compliance rate of 71.3%.
“xAI is clearly moving in the opposite direction from OpenAI,” said xlr8harder.
When Musk introduced Grok, he promised it would be bold, direct, and resistant to so-called “woke” censorship. He also encouraged the team to allow more “edgy” and unfiltered replies.
SpeechMap’s results suggest the latest version is following through on that promise, more than its earlier counterparts.
A Shift Toward “Neutrality”
Despite these claims, Grok hasn’t always been neutral. Previous studies showed that older Grok versions leaned left on issues like LGBTQ+ rights, diversity programs, and economic inequality.
Also read: AI Bias: Grok 3 Briefly Censors Trump and Musk
Musk later blamed that tilt on the model’s training data, which included a wide range of public web content. Since then, xAI has worked to re-train the model with a more balanced approach.
Some high-profile mistakes, such as censoring mentions of Donald Trump or Musk himself, may have pushed the company to act faster.
Now, SpeechMap’s results suggest Grok 3 is closer to political neutrality, at least in terms of response rates.