Smart AI? No – Just Leaderboards, Hype, and Panic Buttons


A large digital leaderboard glows in a lab setting while a person at a kitchen table looks at a small note asking for a simple truthful answer.

Lede

We keep calling it “smart” because it aces exams, then act surprised when it cannot hold a human conversation without sounding like a toaster with a PhD.


Superintelligence Will Drive Us to Extinction and We Cannot Stop It 🤖 | 🎙️ Roman Yampolskiy



Hermit Note: Always watch and listen to both sides of the AI debate, especially the doomers – they are the smoke alarm, even when they annoy you.

What does not make sense

  • If a model is “smarter” but less reliable, that is not intelligence – it’s confidence with bad eyesight.
  • People argue “control” as if it’s a safety plan, when it’s often just a power play with better PR.
  • We say “we cannot stop it” while simultaneously selling subscriptions, APIs, and developer evangelism.
  • “Human-level” gets treated like a finish line, when it is just the starting pistol for misuse at scale.
  • The ant analogy keeps getting used as a threat, but it mainly describes human cruelty, not inevitable machine destiny.

Sense check / The numbers

  1. Gemini 3 Pro began rolling out in preview on 18 November 2025, and Gemini 3 Flash launched on 17 December 2025 – so yes, the hype is recent, not ancient scripture. [Google]
  2. OpenAI says GPT-5.2 started rolling out on 11 December 2025, and its GPT-5.2 system card reports an “under 1 per cent” hallucination rate across five evaluated domains with browsing enabled, which is a meaningful improvement, but also very conditional. [OpenAI]
  3. Claude Sonnet 4.5 was announced on 29 September 2025, and Anthropic highlights OSWorld performance at 61.4 per cent – a benchmark win, not a guarantee of everyday usefulness or truthfulness. [Anthropic]
  4. The Future of Life Institute “Pause Giant AI Experiments” letter was published on 22 March 2023, with a published PDF later showing 27,565 signatures at the time – proof that the panic has been institutional for a while, even if the arguments keep recycling. [FLI]
  5. Roman Yampolskiy is a University of Louisville computer science professor, and the university states he coined the term “AI safety” in a 2011 publication – so the doomer lane is not new, just newly amplified. [University of Louisville]

The sketch

Scene 1: The Leaderboard Cathedral
A shiny altar labelled “ELO” with engineers kneeling, feeding it tokens.
Engineer: “It scored higher!”
User (outside in the rain): “Can it answer my simple question?”
Altar (glowing): “NEXT BENCHMARK.”
Scene 2: The Doomer vs Booster Punch-Up
Two pundits in a studio, one holding a mushroom cloud, the other holding a unicorn.
Doomer: “Extinction is inevitable!”
Booster: “Paradise is inevitable!”
A janitor (labelled “Reality”) mops up: “Its inevitable you both monetise this.”
Scene 3: The Control Room
A politician and a CEO share one button marked “CONTROL”. A citizen sits in a corner marked “CONSENT”.
CEO: “We need guardrails.”
Politician: “We need oversight.”
Citizen: “You mean you need the steering wheel.”



What to watch, not the show

  • Benchmarks as marketing, not measurement – Goodhart’s Law wearing a hoodie.
  • Competition dynamics: if one lab ships, everyone ships, safety memos or not.
  • Truthfulness vs fluency trade-offs: the model that sounds calm can still be wrong.
  • Centralised control: the same humans who broke trust now selling “alignment”.
  • Spiritual illiteracy: we optimise what we can count, then wonder why we feel empty.

Goodhart’s Law states: “When a measure becomes a target, it ceases to be a good measure,” meaning people will manipulate a metric to meet the target, making the original measure unreliable for assessing true performance. This leads to gaming the system, unintended consequences, and focusing on the number rather than the underlying goal, as seen in examples like breeding cobras for bounties or producing tiny nails to meet a quantity target.

The Hermit take

Stop worshipping the scoreboards.
Start judging tools by whether they make humans kinder, freer, and more awake.

Keep or toss

Keep / Toss
Keep the push for lower hallucination and real-world evaluation.
Toss the “smart AI” bragging until it’s ordinary, honest, and human-safe by default.

Sources

  • Google – A new era of intelligence with Gemini 3 (18 Nov 2025): https://blog.google/products/gemini/gemini-3/
  • Google – Gemini 3 Flash: frontier intelligence built for speed (17 Dec 2025): https://blog.google/products/gemini/gemini-3-flash/
  • Google Cloud – Gemini 3 Flash model doc (release date and details): https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-flash
  • OpenAI – Introducing GPT-5.2 (11 Dec 2025): https://openai.com/index/introducing-gpt-5-2/
  • OpenAI – Update to GPT-5 System Card: GPT-5.2 (PDF, 11 Dec 2025): https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf
  • Anthropic – Introducing Claude Sonnet 4.5 (29 Sept 2025): https://www.anthropic.com/news/claude-sonnet-4-5
  • Future of Life Institute – Pause Giant AI Experiments: An Open Letter (22 Mar 2023): https://futureoflife.org/open-letter/pause-giant-ai-experiments/
  • Future of Life Institute – Pause Giant AI Experiments PDF (signature count snapshot): https://futureoflife.org/wp-content/uploads/2023/05/FLI_Pause-Giant-AI-Experiments_An-Open-Letter.pdf
  • University of Louisville – Q&A with Roman Yampolskiy (15 Jul 2024): https://louisville.edu/news/qa-uofl-ai-safety-expert-says-artificial-superintelligence-could-harm-humanity
  • University of Louisville – Roman V. Yampolskiy faculty page: https://engineering.louisville.edu/faculty/roman-v-yampolskiy/
  • arXiv – Artificial Intelligence Safety and Cybersecurity timeline (Yampolskiy, 2016 PDF): https://arxiv.org/pdf/1610.07997
  • LMSYS – Chatbot Arena blog (leaderboard and Elo framing, 3 May 2023): https://lmsys.org/blog/2023-05-03-arena/
  • YouTube – Superintelligence Will Drive Us to Extinction, and We Cannot Stop It (Jon Hernandez AI, featuring Roman Yampolskiy): https://www.youtube.com/watch?v=zYs9PVrBOUg

Satire and commentary. Opinion pieces for discussion. Sources at the end. Not legal, medical, financial, or professional advice.


Satire and commentary. My views. For information only. Not advice.


JOIN OUR NEWSLETTER
And get notified everytime we publish a new blog post.