...

OpenAI’s New AI Agent Takes One Hour to Order Food and Recommends Visiting a Baseball Stadium in the Middle of the Ocean


OpenAI is releasing a new AI agent, creatively dubbed ChatGPT Agent — which is not to be confused with the two other AI agents it’s already released (did we mention that OpenAI has a bit of a branding problem?)

In an announcement, the Sam-Altman-led company says the tool uses its own “virtual computer” to perform tasks on your behalf, like using your calendar to brief you on upcoming meetings, buying the ingredients to make breakfast, and creating a slide deck analysis of business competitors.

The new agent synthesizes the capabilities of its Operator agent, which could carry out web browser-based tasks, and its Deep Research agent, which was designed to conduct multi-step research tasks like generating a personalized report, and plunks them into ChatGPT, allowing you to access the tool from the comfort of the chatbot’s UI while also being able to fine-tune its performance through conversational exchanges.

But there’s a huge caveat. According to OpenAI’s announcement, “ChatGPT requests permission before taking actions of consequence” — meaning that for any actually important task, you can’t just walk away after setting things into motion. A human — you — must be present before the bot pulls the trigger on some of these tasks it’s supposed to be automating.

From a safety point of view, this is unequivocally a good thing, given that AIs are extremely prone to making mistakes. What if it’s about to book the wrong flight? Or what if it fell victim to a prompt injection attack, stumbling on a website designed by hackers to trick an AI model into doing something dangerous, or giving away your money?

Yet this intervention underscores just how untrustworthy this technology remains, which by extension incapacitates its usefulness, suspended in this awkward limbo in which it’s both too dumb and too powerful to just let loose.

This was the same hangup that held back Operator, which also required human approval before “finalizing any significant action.” Like Operator, the ChatGPT Agent also puts users in a “takeover mode” to type in sensitive information, like login credentials and payment info.

At the time, users complained about Operator’s sluggishness, with it taking excruciatingly long to navigate a desktop, and sometimes nagging for a human’s help with tasks it should’ve been able to complete on its own.

That doesn’t appear to have gone away with ChatGPT Agent. As the project’s research lead Isa Fulford admits, the world-beating AI struggled to order a bunch of cupcakes within an even remotely reasonable timeframe.

“That one took almost an hour,” Fulford told Wired, “but it was easier than me doing it myself, because I didn’t want to do it.”

OpenAI’s demonstration of its bot’s purported capabilities in the announcement video doesn’t make a convincing case, either.

Instructed to plan a trip to visit every Major League Baseball stadium in the US, the ChatGPT Agent produces a map (depicted in this Reddit screenshot) showing a stop smack dab in the Gulf of Mexico. Last we checked, there aren’t any ball stadiums sitting out in the open sea. The game times also appear to be wrong. A grand slam, this is not. 

“Cool looking map, I guess,” says product lead Yash Kumar in the video. (Alternatively, you could literally just type “visit all MLB stadiums” in Google, and will find dozens of websites with advice on how to do exactly that, including a tool called “Baseball-RoadTrip.com.”) 

Mistakes like these generally aren’t commented on by the OpenAI presenters in the video. They seem reluctant to double-check the AI’s work throughout, and probably for good reason.

The Agent is first being released to Pro users, who will be capped at 400 prompts per month, and will be rolled out to Plus and Team subscribers soon, who will be limited to just a tenth of that. No timeline was provided for free users.

More on OpenAI: OpenAI Engineer Quits, Says Company Is Pure Chaos Inside

Source link

#OpenAIs #Agent #Takes #Hour #Order #Food #Recommends #Visiting #Baseball #Stadium #Middle #Ocean