The Download: Rethinking AI benchmarks, and the ethics of AI agents

Each time a brand new AI mannequin is launched, it’s sometimes touted as acing its efficiency in opposition to a collection of benchmarks. OpenAI’s GPT-4o, for instance, was launched in Might with a compilation of outcomes that confirmed its efficiency topping each different AI firm’s newest mannequin in a number of exams.

The issue is that these benchmarks are poorly designed, the outcomes onerous to duplicate, and the metrics they use are steadily arbitrary, in accordance with new analysis. That issues as a result of AI fashions’ scores in opposition to these benchmarks decide the extent of scrutiny they obtain.

AI firms steadily cite benchmarks as testomony to a brand new mannequin’s success, and people benchmarks already kind a part of some governments’ plans for regulating AI. However proper now, they won’t be adequate to make use of that approach—and researchers have some ideas for how they should be improved.

—Scott J Mulligan

We have to begin wrestling with the ethics of AI brokers

Generative AI fashions have turn into remarkably good at conversing with us, and creating pictures, movies, and music for us, however they’re not all that good at doing issues for us.

AI brokers promise to vary that. Final week researchers printed a brand new paper explaining how they educated simulation agents to replicate 1,000 people’s personalities with stunning accuracy.

AI fashions that mimic you would exit and act in your behalf within the close to future. If such instruments turn into low cost and straightforward to construct, it should increase a lot of new moral considerations, however two specifically stand out. Read the full story.

—James O’Donnell

Source link

#Obtain #Rethinking #benchmarks #ethics #brokers

The Download: Rethinking AI benchmarks, and the ethics of AI agents

Recent Posts

UBS appoints chief AI officer

Python 3.14 and the End of the GIL

Vaginal condition treatment update: Men should get treated, too

Hackers Dox ICE, DHS, DOJ, and FBI Officials

Meet the man building a starter kit for civilization

Motorola’s Razr Ultra and the Marshall Emberton II top this week’s best deals

Woman Surprised When Large Chunk of NASA Equipment Crashes Down From Sky

Audien Hearing Atom X Hearing Aids Review: High-Tech Case

I compared Sony’s XM6 headphones with the Bose QuietComfort Ultra – this pair wins