An image of Will Smith eating spaghetti (AI Benchmarks Got as Weird as They Could Get)

In 2024, AI Benchmarks Got as Weird as They Could Get

When a new AI video generator hits the market, one of the first tests it often faces isn’t about creating cinematic masterpieces or improving accessibility tools. Instead, it’s tasked with something delightfully absurd like rendering a video of actor, Will Smith, eating spaghetti.

What started as an internet meme has grown to become an unofficial benchmark for AI creativity. It wasn’t just about spaghetti; it symbolized how far AI video generators have come, and how entertaining they can be.

Why Will Smith and Spaghetti Became the AI Meme of 2024

Will Smith himself joined the fun in February, posting a tongue-in-cheek Instagram video of him “eating” a bowl of spaghetti, poking fun at the viral trend. But why has this odd combination stuck around?

  • It’s simple yet challenging: Making a video of a recognizable celebrity eating spaghetti involves complex image rendering, motion tracking, and maintaining realism, all areas where AI is put to the test.
  • It’s accessible: Unlike technical benchmarks, anyone can judge how realistic (or hilarious) the spaghetti video is, making it more relatable.
  • It’s fun: Let’s be honest, watching an AI’s attempt at recreating such a specific scenario is amusing.

Other Oddball AI Benchmarks That Took Off

Will Smith and pasta aren’t alone in this quirky AI benchmarking trend. In 2024, developers got more creative with a series of unusual tests.

1. Minecraft Architecture by AI

A 16-year-old developer built an app giving AI free rein in Minecraft. The task? Design structures ranging from cozy cottages to sprawling castles.

  • Why it matters: It tests an AI’s ability to plan, design, and execute tasks in a dynamic environment. That highlights its adaptability and creativity.
  • Why it’s fun: Who wouldn’t want to watch an AI build a medieval fortress, or fail spectacularly?

2. Pictionary and Connect 4 Showdowns

Across the pond, a British programmer created a platform where AI systems compete in games like Pictionary and Connect 4.

  • What’s being tested: AI’s ability to interpret abstract concepts (in Pictionary) and strategic thinking (in Connect 4).
  • The appeal: Watching AIs guess or outmaneuver each other provides both entertainment and insight into their decision-making processes.

The Problem with Traditional AI Benchmarks

So, why are these playful tests gaining traction when there are serious academic benchmarks already in place?

1. Lack of Relatability

Academic benchmarks often test AI on tasks like solving Math Olympiad problems or answering PhD-level questions. While impressive, these tasks don’t resonate with the average person.

2. Narrow Focus

Even crowd driven tools like Chatbot Arena, which lets users rate AI performance on tasks like coding or image generation, face challenges. Most participants are tech-savvy individuals, meaning results can be skewed by niche preferences.

3. Missing the Human Factor

Ethan Mollick, a professor at Wharton, noted on X (formerly Twitter) that many benchmarks fail to compare AI systems to the average human. This creates a gap between what AI can do and how people actually use it, whether for drafting emails or brainstorming ideas.

Why Weird Benchmarks Are Here to Stay

These unconventional tests may lack the rigor of academic metrics, but they excel in one key area: engagement.

We read all the AI news and test the best tools so you don’t have to. Then we send 30,000+ profesionnals a weekly email showing how to leverage it all to: πŸ“ˆ Increase their income πŸš€ Get more done ⚑ Save time.

  • Easy to Understand: You don’t need a computer science degree to see whether AI nailed a spaghetti video or a Minecraft castle.
  • Entertaining: People love watching quirky AI experiments, and these benchmarks often go viral, sparking conversations about AI’s abilities.
  • Bridging the Gap: They make AI technology more relatable, showcasing its potential in everyday contexts.

How the AI Community Can Strike a Balance

Evaluating AI’s impacts (like its role in healthcare, education, or the workplace) is essential to painting a big picture. 

What could help?

  • Developing benchmarks that reflect real-world uses, like how AI aids radiographers or even educators.
  • Including diverse perspectives in evaluating AI, ensuring benchmarks resonate beyond tech circles.

A Glimpse into AI’s Quirky Future

Will AI benchmarks like Will Smith’s spaghetti-eating saga disappear as the technology matures? Unlikely. They’re too engaging, and frankly, too entertaining, to fade away. As AI becomes more integrated into our lives, these oddball tests remind us of its playful side, making cutting-edge tech feel less intimidating and a lot more fun.

We read all the AI news and test the best tools so you don’t have to. Then we send 30,000+ profesionnals a weekly email showing how to leverage it all to: πŸ“ˆ Increase their income πŸš€ Get more done ⚑ Save time.