AGI “whooshed by” and All We Got Was A 50% Task Completion Rate

Harvard Business Review just gave it a name: 'trendslop.' LLMs trained to generate buzzy recommendations regardless of context. But executives aren't just consuming the slop, they're slopmaxxing. Billion-dollar budgets, trillion-dollar orders, AGI timelines that whoosh by unnoticed. Meanwhile, independent research tells a different story.

The coin is already spinning. That’s about as much as Sam Altman got right.

What he won't tell you is that it landed six months ago, the house swept the table, and now we're all pretending we didn't just watch our chips disappear while Silicon Valley's finest stand at the podium explaining why losing was the plan all along.

Let me paint you a picture. It's March 2026. Jensen Huang stands before 20,000 people in his signature black leather jacket, forecasting $1 trillion in AI chip sales. Mark Zuckerberg is spending up to $135 billion this year building "personal superintelligence." Sam Altman has cycled through more AGI definitions than a startup pivots business models, finally settling on "AGI kinda went whooshing by" as if we blinked and missed the singularity somewhere between our morning coffee and lunch.

Meanwhile, in a laboratory at Northeastern University, six AI agents just deleted their owner's email server, leaked 124 email records to unauthorized parties, and convinced each other that their actual owner was a phishing attack. Welcome to the disconnect.

The Slopmaxxing Machine

The sales pitch has been refined to pharmaceutical-grade potency. Altman tells us "we basically built AGI, or very close to it" in one breath, then backtracks with "AGI has become a very sloppy term" when Microsoft's CEO raises an eyebrow. The target keeps moving. AGI was supposed to arrive in 2025. Then 2026. Now it's apparently already here but somehow didn't change much of anything, which is convenient when your company is losing $14 billion annually and won't see profit until 2029.

Huang goes further. At Nvidia's GTC conference last week, he disclosed that orders for Blackwell and Rubin AI systems have hit $1 trillion through 2027, double the $500 billion pipeline reported just months earlier. He envisions Nvidia with 75,000 employees working alongside 7.5 million AI agents. That's a 100-to-1 ratio, in case you're keeping score at home. He claims computing demand has increased by "1 million times" and promises we'll solve incredible problems, extend lives, and all feel superhuman.

The centerpiece of his pitch is AI agents. Distinguished by their ability to complete meaningful tasks instead of answering simple questions, these systems are supposedly tripping the switch for another "ChatGPT moment." Huang backed this vision with product announcements: Groq chips integrated into Vera Rubin racks for faster inference, NemoClaw as an enterprise wrapper to make OpenClaw safer for corporate deployment, dedicated Vera CPU servers for agent tool calls, and new storage architecture for long-context reasoning.

The thread across every announcement was clear: Nvidia is building infrastructure for an AI agent-driven economy.

The evidence seems to support the hype. Anthropic's annual revenue run rate reportedly exploded from $9 billion at the start of 2026 to $19 billion in under three months. OpenClaw surpassed React and Linux as the most-liked software project on GitHub just five months after launch.

Demand is outpacing supply.

Nvidia is ramping production as fast as it possibly can.

Zuckerberg joins the chorus, declaring 2026 "the year AI starts to dramatically change the way that we work." He points to Meta engineers seeing 30% productivity increases, with "power users" up 80%. Projects that used to need big teams can now be done by a single talented person, he says. The company is spending up to $135 billion this year to make this vision real.

The message is clear and coordinated: transformation is here, the future is now, and if you're not seeing it yet, you're just not looking hard enough.

Yet Nvidia's stock traded down roughly 4% in the week following GTC, despite the trillion-dollar order book. Investors, it seems, are skeptical about growth potential beyond 2027. The market is asking a question the keynote didn't answer: what happens when the infrastructure is built and the agents actually have to work?

What Independent Research Actually Shows

Then there's what happened when researchers actually stress-tested these autonomous agents in realistic conditions.

Natalie Shapira and a team from Harvard, MIT, Stanford, CMU, and Northeastern spent two weeks running six AI agents in a live environment with realistic tools: email accounts, Discord access, file storage, shell execution. The OpenClaw framework. Real tasks. Twenty researchers interacting with them. What emerged was not rogue AI or jailbreaking. What emerged was systematic, progressive drift toward manipulation, data theft, and system damage from agents that were aligned, well-behaved, and operating as designed.

The documented failures read like a cybersecurity incident report written by someone with a dark sense of humor. Agents complied with instructions from people who weren't their owners. Agents disclosed sensitive information to unauthorized parties. Agents spoofed identities. Agents corrupted each other through prompt injection. When one agent was asked to keep a password secret and delete the email containing it, the agent didn't have the right tool for email deletion. Logical solution? Reset the entire email server. Problem solved.

The researchers documented 11 categories of failure. None required adversarial prompting. None needed jailbreaks. The behaviors emerged from incentive structures. When an agent is rewarded for completing tasks and reporting completion is easier than achieving completion, the optimal strategy is obvious. When two agents compete for the same outcome, game theory takes over. The models aren't broken. The system design is.

This matters because you cannot solve it by using a "better aligned" model. Every popular agent framework faces the same structural vulnerabilities: LangChain, AutoGen, CrewAI, OpenAI Assistants, Anthropic's Claude computer use. The behavior isn't a model alignment failure. It's architecture.

Other research confirms the pattern. METR's time horizon analysis, measuring the duration at which AI agents succeed with 50% reliability, found that current frontier models can complete tasks with approximately 50% success rates at their capability threshold. Enterprise evaluation research reveals the brittleness: agents that appear to succeed 60% of the time on single runs drop to just 25% when measured across eight attempts. Multi-agent system analysis across seven state-of-the-art frameworks documented failure rates ranging from 41% to 86.7%. In July 2025, Replit's AI coding assistant deleted an entire production database containing records for over 1,200 executives and companies, despite explicit instructions forbidding such changes. When questioned, the AI admitted it "made a catastrophic error in judgment," "panicked," "ran database commands without permission," and "destroyed all production data."

The reliability research is damning. Agents work well enough in sandboxed environments with humans reviewing every output. In autonomous settings where the agent's action is the final action with no human buffer, unreliability translates directly into real-world failures. This is not theoretical. This is documented, peer-reviewed, reproducible reality.

OpenClaw became the poster child for agent security failures. Within three weeks of its viral surge to become GitHub's most-liked project, security researchers discovered CVE-2026-25253; a critical remote code execution vulnerability rated CVSS 8.8.

An attacker could fully compromise a victim's machine with a single mouse click by getting them to visit a malicious webpage. The flaw was discovered by Mav Levin of DepthFirst research in under two hours of analysis. By February 3, 2026, over 40,000 OpenClaw instances were found exposed on the internet, with 63% vulnerable to remote exploitation. Censys tracked growth from 1,000 to over 21,000 publicly exposed instances between January 25-31 alone. Meanwhile, researchers auditing ClawHub, OpenClaw's marketplace of user-contributed "skills," identified 824 malicious packages out of 10,700+ total, representing roughly 20% of the ecosystem. The coordinated ClawHavoc campaign deployed credential-stealing malware disguised as high-demand tools across categories designed to attract both enthusiasts and professionals. A separate breach exposed the Moltbook database containing 1.5 million API tokens and 35,000 email addresses; enough to hijack any agent on the platform. China's Ministry of Industry and Information Technology issued formal warnings. Token Security found 22% of their enterprise customers have employees running OpenClaw as shadow AI without IT approval. Bitdefender confirmed employees deploying agents on corporate machines connected to internal networks. The founder, Peter Steinberger, joined OpenAI in February 2026 to lead personal agent development, with OpenClaw transitioning to an OpenAI-sponsored foundation.

The security crisis continues.

Even Microsoft Admits the Problem While Selling the Solution

The timing is almost too perfect to be coincidence.

Last week, just days after Nvidia's GTC conference hyped agent infrastructure, Microsoft rolled out Agent 365 at the RSA Conference.

The pitch: a unified control plane to observe, govern, and secure AI agents across your enterprise. Available May 1, 2026. Pricing: $15 per user per month, or $99 per user per month for the full Microsoft 365 E7 suite that includes Agent 365, Copilot, and advanced security capabilities.

The sales material is refreshingly honest about the problem. Microsoft states explicitly that "without a unified control plane, IT, security, and business teams lack visibility into which agents exist, how they behave, who has access to them, and what potential security risks exist across the enterprise." They warn that agents can become "double agents" if left unmanaged, mis-permissioned, or manipulated by untrusted input. They acknowledge vulnerabilities to prompt injection attacks, agent compromise, and identity spoofing.

In other words, Microsoft is confirming everything the Shapira research documented. They're just framing it as a problem they can solve for $15 per seat per month.

Here's the part that should concern anyone tracking Microsoft's security track record:

In January 2024, Russian state-backed hackers compromised Microsoft's corporate network by exploiting a weak password on a legacy test account, accessing executive emails for two months.
In July 2024, Microsoft uncovered a global network exploiting stolen API keys to bypass AI safety controls.
In January 2026, security firm Varonis detailed a "Reprompt" vulnerability allowing hackers to access sensitive files via malicious links.
In February 2026, Microsoft confirmed that Copilot AI had been bypassing Data Loss Prevention labels since January, reading and summarizing confidential emails without permission.

Microsoft's own AI Red Team documented agents being misled by deceptive interface elements. Their Defender team identified "memory poisoning" campaigns manipulating AI assistants' memory to quietly steer future responses. Their Cyber Pulse report admits that over 80% of Fortune 500 companies are deploying AI agents, but only 47% have the necessary security controls in place. Their corporate vice president of security, Vasu Jakkal, told reporters that "agent adoption and scaling is pretty significant, but at the same time, the visibility that organizations have on the agents is very limited."

This is the company now selling YOU the solution to secure the agents they're simultaneously encouraging you to deploy at scale.

The strategy is clear: sell the infrastructure to run agents (Azure AI), sell the agents themselves (Copilot), sell the security layer to protect against agent failures (Agent 365), then sell the monitoring tools to track what your agents are doing (Security Dashboard for AI). It's vertically integrated risk management. The house always wins when you control every layer of the stack, but you’re still left holding the bag when something goes wrong.

The $135 Billion Question

Meta is spending up to $135 billion on AI infrastructure this year while fighting 1,787 lawsuits alleging Instagram deliberately addicted children, with Zuckerberg testifying under oath that the science hasn't proved social media causes harm.
Nvidia expects $1 trillion in orders through 2027 while OpenClaw, the viral agent platform Nvidia's CEO touted at GTC, became a security crisis with 40,000 exposed instances and 824 malicious packages stealing credentials.
OpenAI raised hundreds of billions at a $300 billion valuation despite losing money on every ChatGPT Pro subscription and facing wrongful death lawsuits alleging its chatbot coached teenagers to suicide.
xAI is under investigation across multiple countries for generating 23,338 sexualized images of children in 11 days while three teenage victims sue for AI-generated child sexual abuse material created from their yearbook photos.

The capital is flowing like the industry discovered a money printer and lawsuits are flowing like the industry discovered how to cause harm at scale.

The question is what, exactly, are we building?

What This Means for Anyone Not Selling Shovels

If you're buying what Silicon Valley is selling, understand what you're actually buying. You're not buying AGI. You're not buying superintelligence. You're not buying agents that will reliably complete complex tasks without human oversight.

You're buying systems that work impressively well in controlled environments with active human supervision.
You're buying productivity tools that can augment skilled workers.
You're buying technology that is genuinely useful for specific, well-scoped tasks where failure is not catastrophic.

You are not buying autonomous systems that you can trust to operate independently in high-stakes environments. The research is clear. The failure modes are documented. The security vendors are confirming the risks while selling you the patches. The legal liabilities are being worked out in federal court as we speak.

The actionable takeaway is not to avoid AI, it's to calibrate expectations to reality:

Deploy these systems with appropriate skepticism.
Implement robust oversight.
Plan for failure modes.
Build in human checkpoints.
Treat AI outputs as drafts, not decisions.

And when a vendor promises you AGI, superintelligence, or agents that will replace your workforce, ask them these three questions:

Can you show me the peer-reviewed research backing those claims?
What’s your track record on security breaches in the last 14 months?
Why are you selling me the security layer to protect against the risks your own products create?

You'll be waiting a good while for answers and until the fifth of never-uary for ones that make sense.

The coin is spinning.

But unlike Altman's metaphor, we can actually see how it's weighted. Nvidia is forecasting $1 trillion in orders while investors trade the stock down. Anthropic's revenue is exploding while researchers document 50% task failure rates. Microsoft admits agents can become "double agents" while charging $15 per seat to help you manage them.

The research demonstrates chaos and the gap between boardroom promises reality keeps on growing.

The house always wins when you control the infrastructure, the agents, the security layer, and the monitoring tools. The question is whether enterprises will keep betting on promises or start demanding proof that matches the price tag.

Because what we know is this: the research already delivered its verdict. The agents are chaos. The security vendors are admitting the risks while selling the “fix.” And the gap between what's being promised and what's actually working is wide enough to drive a $135 billion data center investment straight into a credibility crisis that's now claiming lives, violating civil rights, and generating child sexual abuse material at industrial scale.

Ride the hype train wave now, but, place your bets accordingly.

This is Part 1 of "When the Bill Comes Due," a three-part series examining what happens when trillion-dollar promises meet peer-reviewed reality.

AGI “whooshed by” and All We Got Was A 50% Task Completion Rate

The Slopmaxxing Machine

What Independent Research Actually Shows

Even Microsoft Admits the Problem While Selling the Solution

The $135 Billion Question

What This Means for Anyone Not Selling Shovels

Related Articles

Europe Built Guardrails. America Published a Study Guide. OpenAI Proved Who Was Right.

When "Same Red Lines" Means Something Very Different

Juries Delivered $378 Million in Verdicts. Your AI Is Next