Did DeepSeek Train Its AI on Gemini? What We Know So Far

Updated:June 3, 2025

Reading Time: 3 minutes

The AI world is buzzing again.

This time over a new claim that Chinese lab DeepSeek may have used Google’s Gemini outputs to train its latest model.

While there’s no hard proof yet, several experts believe the signs are too loud to ignore.

Let’s unpack the story in plain English.

What’s DeepSeek and What Did They Release?

Last week, DeepSeek launched a new version of its reasoning-focused AI, called R1-0528.

The model scored high on math and coding benchmarks, which caught the attention of developers and researchers alike.

But something seemed off.

DeepSeek didn’t say where its training data came from, and that raised eyebrows.

Why Are People Pointing Fingers at Gemini?

A developer from Melbourne, Sam Paeach, ran tests and noticed something strange.

Deepseek

His findings?

The words and phrasing used by R1-0528 felt very similar to those used by Gemini 2.5 Pro, Google’s powerful AI model.

He wasn’t alone.

Another anonymous developer, known for creating free speech evaluation tools, also chimed in.

He said the model’s “thoughts” – the step-by-step process it takes to get an answer – looked a lot like Gemini’s.

It’s not proof. But it’s enough to raise questions.

DeepSeek Has Faced Similar Accusations Before

This isn’t the first time DeepSeek has been in the spotlight for this.

Back in December, developers noticed their V3 model often referred to itself as “ChatGPT”, OpenAI’s popular chatbot.

That raised suspicions that it may have been trained using logs from ChatGPT itself.

To add fuel to the fire, earlier this year, OpenAI told the Financial Times that it had found signs that DeepSeek was using a process called distillation – a method that involves training a smaller model using outputs from a larger, more advanced one.

And according to Bloomberg, Microsoft discovered that large chunks of data were being taken from OpenAI developer accounts in late 2024.

These accounts were reportedly tied to DeepSeek.

Wait, What’s Distillation?

Here’s a quick breakdown of what distillation is:

TermWhat It Means
DistillationTeaching a new model by using the answers generated by a stronger model.
Why it mattersIt saves time, data, and money—but can also cross ethical lines.

Distillation isn’t illegal on its own.

But OpenAI’s rules clearly state that you can’t use its models to train your own competing AI.

But Could It All Just Be a Coincidence?

Maybe. Experts say many AI models now sound alike.

That’s because the internet, the source of most training data, is now packed with AI-generated content.

Bots are posting on Reddit and X.

Content farms are flooding Google search with AI-written articles.

It’s like trying to cook a fresh meal when your ingredients are already leftovers from another dish.

This kind of AI “contamination” makes it hard to know where one model ends and another begins.

Expert Opinions: Could DeepSeek Be Copying?

Some researchers think it’s possible.

Nathan Lambert, a scientist from nonprofit AI2, said on X that if he were in DeepSeek’s shoes, he’d consider using outputs from top models too.

His reasoning? DeepSeek may have money but not enough powerful GPUs.

In simple terms: It’s faster to rent the answers than to find them yourself.

AI Companies Are Tightening Security

To fight back against this kind of copying, AI companies are locking things down.

Here’s what’s changed:

  • OpenAI now asks users to verify their identity before using advanced models. A government-issued ID is required, and China isn’t on the list.
  • Google has started “summarizing” the model’s internal steps to hide how answers are formed.
  • Anthropic, another big AI player, says it will begin doing the same.

These moves make it harder for rivals to mimic their technology.

The Big Picture

This debate reveals how fierce the AI race has become. Models like Gemini and GPT-4 take years to build.

So when a newer lab like DeepSeek suddenly makes big gains, people start asking: Did they really build this on their own?

Whether or not DeepSeek used Gemini’s output, one thing’s clear – AI companies are stepping up to protect their secrets.

And with billions at stake, you can bet this won’t be the last controversy we see.

Quick Recap

Here’s a summary of what’s going on:

What HappenedWhy It Matters
DeepSeek released R1-0528Raised eyebrows for high performance and unknown training data
Experts suspect Gemini influenceTraces and language patterns look similar
DeepSeek has faced past accusationsV3 model once claimed to be ChatGPT
Distillation in questionOpenAI and Microsoft traced data leaks to DeepSeek-linked accounts
AI companies respondTightening access and summarizing model behavior to stop copying

Onome

Contributor & AI Expert