High Schooler Builds Website for AI Minecraft Battles

Published:March 21, 2025

Reading Time: 2 minutes

AI models are facing off in a new kind of competition: Minecraft build-offs. A high school senior, Adi Singh, created Minecraft Benchmark (MC-Bench), a website that lets users challenge AI models to build structures in the game. Players vote on the best creations before seeing which AI made them.

Why Minecraft?

Singh believes Minecraft is the perfect way to showcase AI progress. The game is the best-selling of all time, so most people recognize its signature blocky style. Even those who haven’t played can compare two builds and decide which looks better.

“Minecraft allows people to see AI development progress much more easily,” Singh said. “People are used to Minecraft, used to the look and the vibe.”

How MC-Bench Works

MC-Bench currently has eight volunteer contributors. Major AI companies like Anthropic, Google, OpenAI, and Alibaba provide access to their models for benchmarking, but they are not officially affiliated with the project.

Right now, the website focuses on simple builds. The goal is to measure progress from early AI models like GPT-3 to today’s more advanced versions. However, Singh sees potential for bigger challenges in the future.

“Currently, we are just doing simple builds to reflect on how far we’ve come from the GPT-3 era,” Singh said. “But we could scale to longer-form plans and goal-oriented tasks.”

Games as AI Testing Grounds

AI benchmarking is tricky. Traditional tests often favor AI because models are trained to solve specific problems. They can score in the 88th percentile on an LSAT but struggle to count the letters in “strawberry.” Similarly, Anthropic’s Claude 3.7 Sonnet performs well on software engineering tests but plays Pokémon worse than a five-year-old.

That’s why researchers turn to games for a different kind of AI evaluation. Past experiments have used Pokémon Red, Street Fighter, and Pictionary to measure AI capabilities. Games provide a safe, controlled space to test reasoning skills.

Also read: In 2024, AI Benchmarks Got as Weird as They Could Get

The Importance of MC-Bench

MC-Bench is technically a programming benchmark since AI models generate code to create builds. However, it’s easier for users to judge a snowman’s quality than to analyze complex code. This makes MC-Bench more accessible and useful for gathering AI performance data.

According to Singh, MC-Bench’s leaderboard aligns well with real-world AI performance.

“The current leaderboard reflects quite closely to my own experience of using these models, which is unlike a lot of pure text benchmarks,” he said. “Maybe MC-Bench could help companies see if they’re heading in the right direction.”

Lolade

Contributor & AI Expert