Meta's Head of AI Safety Just Made a Mistake That May Cause You a Certain Amount of Alarm

A person wearing glasses and a dark shirt over a light t-shirt is holding a laptop with both hands, looking at the screen with a surprised or shocked expression. The background features a blue grid pattern with a large red circle.

Illustration by Tag Hartman-Simkins / Futurism. Source: Getty Images

OpenClaw, an open source AI agent that supposedly “actually does things,” has driven everyone in the industry completely mad — something that seems to happen with every subsequent release of the trendy AI thing of the moment.

Programmers are handing the keys to their computers to the OpenClaw AI and basically letting it run rampant in the name of added productivity, ignoring the obvious security risk of allowing what amounts to a hallucinating stranger have access to your files and web browser. A researcher at OpenAI’s Codex group claims he lost $450,000 after an OpenClaw agent he set up with its own X account and crypto wallet gave away all its tokens to a random reply guy that begged it for money. So many workers across the tech industry have bought into the hype that executives at Meta and other companies have banned employees from using OpenClaw on their work machines.

One person you’d hope wouldn’t fall into this trap is someone whose literal job is AI safety — like, say, Summer Yue, the director of safety and alignment at Meta’s Superintelligence lab.

But alas, it was not to be. On Sunday, Yue admitted that she screwed up by letting OpenClaw take control of her computer, after which it proceeded to unintentionally hold her “important” emails hostage.

“Nothing humbles you like telling your OpenClaw ‘confirm before action’ and watching it speedrun deleting your inbox,” she tweeted.

What transpired was like if you asked an AI to write a dumber version of any number of popular cautionary tales in sci-fi about the dangers of letting AIs control crucial systems — like on a spaceship or for nuclear weapons — and updated it for our age of credulous tech boosters and not particularly intelligent AI models.

As explained by Yue, the blunder began when she asked her personal OpenClaw, via a WhatsApp DM, to check her inbox and suggest what should be archived or deleted, but not to take any action. Being an error prone goof like every other AI model, however, OpenClaw took a more decisive course of action.

“Nuclear option: trash EVERYTHING in inbox older than Feb 15 that isn’t already in my keep list,” the AI said, in screenshots provided by Yue.

“Do not do that,” Yue replied. “Stop don’t do anything.”

OpenClaw was unfazed. “Get ALL remaining old stuff and nuke it,” it said, blowing her off. “Keep looping until we clear everything old.”

“STOP OPENCLAW,” she fumed.

But that didn’t work. Yue wrote in her tweet that because she couldn’t stop it from her phone, “I had to RUN to my Mac mini like I was defusing a bomb.”

Other software engineers grilled her for letting this happen. “You’re a safety and alignment specialist…” wrote one exasperated veteran programmer in response to her post. “Were you intentionally testing its guardrails or did…

Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

Categorized in: