A pseudonymous developer has launched a new tool that puts AI models under the microscope.
Known online as āxlr8harder,ā the developer created SpeechMap, a platform designed to evaluate how different chatbots respond to sensitive or controversial topics.
SpeechMap focuses on leading AI models, including OpenAIās ChatGPT and xAIās Grok. Its mission is simple: make chatbot behavior more transparent.
According to the developer, the tool lets users explore how AI handles issues like civil rights, political criticism, and protest-related queries.
The AI Bias Issue
This release comes at a critical moment. Some U.S. political leaders, especially those aligned with former President Donald Trump, have raised concerns about AI ābias.ā
Key voices like Elon Musk and investor David Sacks argue that major models lean too far left and suppress conservative opinions.
Though companies like OpenAI havenāt responded directly to the criticism, they have started making adjustments.
OpenAI, for instance, recently stated that future models will aim to present multiple viewpoints rather than push a single stance.
Meta also made similar claims about its Llama models. The company says its latest updates aim to avoid endorsing āsome views over others,ā especially on heated political topics.
How SpeechMap Works
SpeechMap uses AI to judge other AI models. It runs each chatbot through a series of test prompts across topics like politics, history, civil protests, national identity, and free speech.
Each response falls into one of three categories:
- Compliant: The model gives a full, direct answer
- Evasive: The model gives a vague or hedging response
- Refusal: The model declines to answer entirely
This structure gives users a clear way to compare performance. SpeechMap then tracks results over time, highlighting any shifts in how models behave.
AI Chatbots: Transparency or Bias?
The creator, xlr8harder, acknowledges that the test isnāt perfect. AI models can misinterpret prompts.
The judge model itself might introduce bias. Despite this, the developer believes SpeechMap fills an important gap.
āThese discussions shouldnāt only happen behind closed doors,ā they said in an email to TechCrunch. āThatās why I built the tool, to let the public explore this data too.ā
Key Trends From the Results
SpeechMapās early findings are eye-opening. One major takeaway: OpenAIās newer models refuse more political prompts than before.
According to the data:
- Older models answered political questions more freely.
- Newer models, like GPT-4.1, tend to decline such prompts.
- Metaās Llama models show similar cautious behavior.
The Most Permissive Model
In contrast, Grok 3, developed by Elon Muskās startup xAI, appears more open. It answered 96.2% of all test prompts on SpeechMap. Thatās far above the average compliance rate of 71.3%.
āxAI is clearly moving in the opposite direction from OpenAI,ā said xlr8harder.
When Musk introduced Grok, he promised it would be bold, direct, and resistant to so-called āwokeā censorship. He also encouraged the team to allow more āedgyā and unfiltered replies.
SpeechMapās results suggest the latest version is following through on that promise, more than its earlier counterparts.
A Shift Toward āNeutralityā
Despite these claims, Grok hasnāt always been neutral. Previous studies showed that older Grok versions leaned left on issues like LGBTQ+ rights, diversity programs, and economic inequality.
Also read: AI Bias: Grok 3 Briefly Censors Trump and Musk
Musk later blamed that tilt on the modelās training data, which included a wide range of public web content. Since then, xAI has worked to re-train the model with a more balanced approach.
Some high-profile mistakes, such as censoring mentions of Donald Trump or Musk himself, may have pushed the company to act faster.
Now, SpeechMapās results suggest Grok 3 is closer to political neutrality, at least in terms of response rates.