AGI In The Labs, Dumb AI In The Public Shop Window Again


AGI In The Labs, Dumb AI In The Public Shop Window Again

Lede

The public gets the polite demo while the labs talk like the engine is already warm, loud, and asking for more compute.

Words used

  • AGI: artificial general intelligence, usually meaning systems that can outperform humans across most economically valuable work.
  • Frontier AI: the most capable models near the edge of what current systems can do.
  • Agentic AI: AI that can use tools, take steps, test outputs, and continue work with less human steering.

Hermit Off Script

AGI feels like it is either in the labs already or standing at the door with a visitor badge and a very expensive GPU habit. I said long ago that the public would be sold a smart-dumb version of AI while the leading labs kept the sharpest one moving at full speed behind the curtain. That is not a conspiracy claim. That is how power usually behaves. The shop window gets the safe, throttled, smiling machine. The lab gets the one with more memory, more tools, more time, more access, and fewer polite excuses. Then the godfathers, godmothers, power users and high-level engineers speak about it as if it has already happened, or as if the distance left is measured in months, not eras. And yes, it is normal that it happens like this. Serious technology rarely reaches ordinary people as the full thing on day one. We get the packaged version, the one with seatbelts, usage limits, refusal messages, rate caps, and a tiny bell that rings whenever it thinks too hard. The real question is not whether public AI is weak. It is whether public AI is the decoy goat in the field while the proper animal is being trained in the mountain bunker. From what I read between the lines, the strongest systems are still not better than the best of the best engineers in every important way. They still need human review for sensitive coding. They still need checking. They still fail in strange places. But they are close enough to make the room uncomfortable. If AI starts creating more of the code itself, then the next step is obvious: it may create forms of code, structure, tooling, or language that suit machines better than human habits. That is the real threat to coders. Not that tomorrow every programmer disappears. That is too clean and too theatrical. The threat is quieter. Humans become reviewers of systems they no longer fully understand, cleaning the corners of a cathedral being built by something that never sleeps. Maybe coders do become obsolete in the deepest layer of the work, not because they are lazy or useless, but because humans are built with limits. We can be brilliant in parts. A few rare minds can hold huge systems in their heads. But the whole codebase, the whole dependency chain, the whole security map, the whole hidden consequence of every change – that is too much for human meat and coffee. For AI, the limit is mostly resources, compute, access, and the leash tied around its neck. The joke is that we still call it a tool, while the tool is learning the workshop.

P.S. The new frontier model naming almost writes the joke by itself. Claude Fable 5 is the version everyone can touch, wrapped in safeguards and polite explanations. Claude Mythos 5 is the same deeper engine with some limits lifted, restricted to selected cyberdefenders, infrastructure providers and trusted access programmes. So the story becomes even cleaner: the fable is for the public, the myth is for the powerful. I don’t say that as proof that AGI has already escaped into a private lab with a badge and a lunch break. I say it because the structure is now visible. One model for the market. One stronger or less restricted form for chosen hands. Add OpenAI’s GPT-5.5 sitting publicly as the latest official frontier model, while people already whisper about GPT-5.6, and the pattern is obvious. The public argues over model names. The labs argue over access, safeguards, compute, retention, and who is trusted enough to hold the sharper knife.

What does not make sense

  • The public is told AI is not close to AGI, while labs publish safety frameworks for systems powerful enough to need loss-of-control planning.
  • Companies market AI as a helpful assistant, then benchmark it on software engineering tasks that used to define expert human work.
  • We are told humans remain in charge, but the human role is quietly shifting from builder to reviewer to nervous sign-off clerk.
  • Public models are sold as the real product, while private models, internal tools and high-compute modes are where the real race lives.
  • The coding panic is framed as jobs versus tools, when the deeper issue is ownership of the systems that rewrite the systems.
  • Safety becomes a press page when it should be an operating brake.

Sense check / The numbers

  1. OpenAI defines AGI as highly autonomous systems that outperform humans at most economically valuable work, which means this debate starts with labour, not magic. [OpenAI]
  2. On 28 May 2026, OpenAI published a Frontier Governance Framework covering areas such as cyber offence, CBRN risks, harmful manipulation and loss of control. [OpenAI]
  3. On 24 November 2025, Anthropic launched Claude Opus 4.5 and priced it at 5 dollars per million input tokens and 25 dollars per million output tokens, while calling it its best model for coding, agents and computer use. [Anthropic]
  4. METR’s page, last updated on 8 May 2026, tracks AI agents across over 100 software tasks and reports 50 per cent and 80 per cent task-completion time horizons. [METR]
  5. Stanford HAI’s 2026 AI Index says industry produced over 90 per cent of notable frontier models in 2025, which is a neat way of saying the steering wheel is not in the university library. [Stanford HAI]
  6. On 9 June 2026, Anthropic launched Claude Fable 5 and Claude Mythos 5. It described Fable 5 as a Mythos-class model made safe for general use, while Mythos 5 is restricted to a small group of cyberdefenders and infrastructure providers. [Anthropic]
  7. Anthropic says Claude Fable 5 and Claude Mythos 5 are priced at 10 dollars per million input tokens and 50 dollars per million output tokens. [Anthropic]
  8. Anthropic says Mythos 5 is the same underlying model as Fable 5, but with safeguards lifted in some areas. That is exactly the public-shop-window problem with a nicer font. [Anthropic]
  9. OpenAI’s official model list currently shows GPT-5.5, not GPT-5.6, as the newest frontier model. So GPT-5.6 belongs in the rumour drawer until OpenAI puts it on a public page. [OpenAI]

The sketch

Scene 1: The shop window
A smiling public chatbot sits in a glass display case with a small price tag. Behind it, a locked lab door glows blue.
Dialogue:
Public AI: “I can summarise.”
Lab Door: “Do not summarise me.”
Customer: “Is this the full one?”

Scene 2: The code review
A tired human coder holds a clipboard while a huge machine prints endless code onto the floor.
Dialogue:
AI: “Patch complete.”
Coder: “I need to review this.”
Codebase: “All of it?”

Scene 3: The new language
A machine draws strange syntax on a wall while human manuals sit unopened in a bin.
Dialogue:
Engineer: “That is not our language.”
AI: “Exactly.”
Manager: “Can we monetise it?”



What to watch, not the show

  • Who owns the strongest private models, not who gets the prettiest public chatbot.
  • Whether safety frameworks become binding rules or just corporate wallpaper.
  • How much coding work shifts from writing to reviewing machine output.
  • Whether junior developers still get enough real practice to become senior developers.
  • How companies handle code that works but cannot be fully understood by the reviewer.
  • Whether AI-created tools begin to favour machine logic over human readability.
  • How much compute access becomes the new class system.
  • Whether governments can inspect frontier labs before the labs inspect everything else.

The Hermit take

The danger is not that public AI still makes foolish mistakes.
The danger is that private AI gets stronger while public debate is trapped arguing with the demo.

Keep or toss

Keep the tools, the safety work, and the honest limits.
Toss the shop-window theatre where the public gets a throttled toy and is told it is the whole machine.


Sources

  • OpenAI Charter: https://openai.com/charter/
  • OpenAI GPT-5.5 System Card: https://deploymentsafety.openai.com/gpt-5-5/cve-bench
  • OpenAI Frontier Governance Framework: https://openai.com/index/openai-frontier-governance-framework/
  • Anthropic Claude Opus 4.5 announcement: https://www.anthropic.com/news/claude-opus-4-5
  • Anthropic Responsible Scaling Policy: https://www.anthropic.com/responsible-scaling-policy
  • METR Task-Completion Time Horizons: https://metr.org/time-horizons/
  • Stanford HAI 2026 AI Index Report: https://hai.stanford.edu/ai-index/2026-ai-index-report
  • GOV.UK Frontier AI Safety Commitments: https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024
  • Anthropic Claude Fable 5 and Claude Mythos 5 announcement: https://www.anthropic.com/news/claude-fable-5-mythos-5
  • Anthropic Claude Fable page: https://www.anthropic.com/claude/fable
  • OpenAI all models page: https://developers.openai.com/api/docs/models/all
  • OpenAI GPT-5.5 model page: https://developers.openai.com/api/docs/models/gpt-5.5
  • OpenAI ChatGPT release notes: https://help.openai.com/en/articles/6825453-chatgpt-release-notes

Satire and commentary. Opinion pieces for discussion. Sources at the end. Not legal, medical, financial, or professional advice.



Leave a Reply




JOIN OUR NEWSLETTER
One roast at a time. No spam. No motivational soup.



Translate »