Home
Blog
Wikipedia Partners with Kaggle to Offer AI Data and Reduce Bot Scraping

AI News

Wikipedia Partners with Kaggle to Offer AI Data

Updated:April 18, 2025

Reading Time: 2 minutes

A human handing a robot a stack of files (Wikipedia partners with Kaggle to provide AI data)

Home
Blog
Wikipedia Partners with Kaggle to Offer AI Data and Reduce Bot Scraping

AI News

Wikipedia Partners with Kaggle to Offer AI Data and Reduce Bot Scraping

Updated:April 18, 2025

Table Of Content

Heading 2 Example

Written by:

Lolade

Share 0

Post 0

Share 0

Wikipedia is one of the most visited websites in the world because it is a major source of information for both people and machines.

However, AI bots have been scraping their content at high volumes, causing an increasing strain on Wikipedia’s servers.

To address the issue, the Wikimedia Foundation has taken a new approach. Rather than blocking bots, it is offering developers a better option.

On April 17, 2025, the Foundation announced a partnership with Kaggle, a data science platform owned by Google.

The Partnership Offer

The partnership introduces a new, openly licensed dataset that includes structured content from Wikipedia in English and French.

Developers can now access this dataset directly on Kaggle to remove the need for scraping raw web pages.

The dataset is designed with machine learning in mind. It includes short summaries, article descriptions, image links, infobox data, and organized article sections.

It does not, however, include references, audio files, or other non-text content. The format is clean, consistent, and easy for machines to read, specifically in JSON.

Also read: How AI Handles Large Data Sets

What Developers Want

Many AI models rely on Wikipedia for training data, but scraping the site is inefficient and risky.

The new dataset solves this issue by offering clean, ready-to-use information. It also saves time, reduces errors, and makes legal reuse simpler.

Developers aren’t the only ones with something to gain; smaller teams and individual researchers also. They now have access to the same kind of structured data that was once only available to large tech firms.

Benefits for Wikipedia

Wikipedia gains just as much. The new strategy reduces server load and discourages aggressive scraping. It also allows the Foundation to guide how its content is used.

Kaggle’s Role in the Collaboration

Kaggle is widely used by data scientists to host datasets, public notebooks, and competitions. With a Kaggle partnership, Wikimedia ensures broad access and easy adoption.

Brenda Flynn, Kaggle’s partnerships lead, stated:

“Kaggle is excited to play a role in keeping this data accessible, available, and useful.”

Now, anyone on Kaggle can explore, analyze, and even build projects using Wikipedia’s structured content.

Why Not Just Block the Bots?

Blocking AI bots may seem like a simpler solution, but Wikipedia was built on the idea of openness. Cutting off bots entirely would conflict with its original mission.

Instead, the Foundation chose to meet developers halfway. It offers a cleaner, more reliable option, and in return, it hopes developers will stop scraping and use the official dataset.

Tags:

AI Bots, AI technology, artificial intelligence

Wikipedia Partners with Kaggle to Offer AI Data

Wikipedia Partners with Kaggle to Offer AI Data and Reduce Bot Scraping

The Partnership Offer

What Developers Want

Benefits for Wikipedia

Kaggle’s Role in the Collaboration

Why Not Just Block the Bots?

Meta Plans to Trade Its AI Power Plants for Cash

What Does AI Hallucination Look Like in 2026?

Anthropic’s Fable 5 Is Coming Back. The 18-Day Standoff With Trump Is Over.

Stop Reading About AI.

Start Using It.