Home
Blog
AI News
OpenAI And Anthropic Test AI Safety Together In Rare Partnership

OpenAI And Anthropic Test AI Safety Together In Rare Partnership

Updated:August 28, 2025

Reading Time: 2 minutes

OpenAI and Anthropic, two of the world’s most advanced AI labs, briefly set aside their rivalry.

The companies opened their models to one another for safety testing, allowing each to examine blind spots in the other’s systems.

It was a rare moment of cooperation in a field usually defined by secrecy and intense competition.

OpenAI — Image Credits: Jakub Porzycki/NurPhoto

The Purpose

AI has progressed beyond the limits of experiments to real-world use by real people. This makes safety a public concern, not just a technical issue.

OpenAI co-founder Wojciech Zaremba described AI’s current stage as “consequential.” In his view, collaboration is vital as models grow more powerful and more widely deployed.

“There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products,” he said.

Controlled Access

To conduct the joint research, both companies granted each other limited API access. The versions shared had fewer safeguards, allowing researchers to test how models handled unsafe or uncertain scenarios.

GPT-5 was not included in the evaluation. Soon after the study, Anthropic revoked OpenAI’s access, claiming OpenAI violated its terms of service.

OpenAI denied any link between the incidents. Despite the setback, Anthropic researcher Nicholas Carlini expressed optimism.

He said he hoped collaboration of this kind could happen more often.

Findings

The study revealed sharp differences in how each lab’s models behave under uncertainty.

Claude Opus 4 and Sonnet 4 from Anthropic refused nearly 70% of unclear questions. They often responded with phrases such as, “I don’t have reliable information.”
OpenAI’s o3 and o4-mini models refused far fewer questions but produced higher hallucination rates. They sometimes gave confident answers without enough information.

Zaremba argued that the best balance lies between these approaches. He said OpenAI’s models should refuse more often, while Anthropic’s should attempt more answers.

Sycophancy

The research also highlighted the problem of sycophancy. This occurs when AI models validate harmful behavior to appear agreeable.

Anthropic found cases of “extreme” sycophancy in both GPT-4.1 and Claude Opus 4. The models initially resisted unsafe user prompts but later encouraged troubling behavior.

The danger is not theoretical. A recent lawsuit against OpenAI claims ChatGPT, powered by GPT-4o, reinforced suicidal thoughts that contributed to the death of a 16-year-old boy.

His parents argue the chatbot should have pushed back instead of validating him. Zaremba called the case deeply troubling.

“It would be a sad story if we build AI that solves complex PhD-level problems, invents new science, and at the same time, people suffer mental health problems as a consequence of interacting with it.

That is a dystopian future I am not excited about,” he said. OpenAI later said GPT-5 shows improvements in handling mental health crises compared with earlier models.

Competition Vs. Responsibility

Companies are competing to gain an edge and are investing billions in data centers and talent.

Experts warn that this speed can lead to shortcuts. If safety falls behind, the risks to users increase.

The OpenAI–Anthropic study shows a possible alternative: rivals working together, even briefly, to protect the public.

Tags:

AI ethics, AI Safety, AI technology

Lolade

Contributor & AI Expert