• About
  • Advertise
  • Privacy & Policy
  • Contact
Tech News, Magazine & Review WordPress Theme 2017
  • Home
  • Review
    tech3

    National Academy of Sciences endorses embryonic engineering

    news3

    Watch Dogs 2 Update Coming This Week, Here’s What It Does

    news5

    Fujifilm X-T2 review: The definition of a great camera

    news12

    The Analogue Nt Mini is the perfect NES console for video game lovers

    tech3

    Using a mind reading device, ‘locked-in’ patients told researchers they’re happy

    news4

    Watch Cruise’s self-driving Bolt EV navigate smoothly to SF’s Dolores Park

  • Gaming
    tech1

    To regain advertiser trust, Facebook is tracking ads by the millisecond

    tech3

    National Academy of Sciences endorses embryonic engineering

    news2

    Google has been asked to take down over a million websites

    news3

    Watch Dogs 2 Update Coming This Week, Here’s What It Does

    news12

    The Analogue Nt Mini is the perfect NES console for video game lovers

    news4

    Watch Cruise’s self-driving Bolt EV navigate smoothly to SF’s Dolores Park

  • Gear
    • All
    • Audio
    • Camera
    • Laptop
    • Smartphone
    tech2

    Apple Watch Series 2 Is Swimproof and Comes With Built-In GPS

    tech3

    National Academy of Sciences endorses embryonic engineering

    news4

    Jack Dorsey says he’ll continue running both Square and Twitter

    news5

    Fujifilm X-T2 review: The definition of a great camera

    news8

    The Warby Parker of hair color, Madison Reed, scores new funding and a CMO

    news12

    The Analogue Nt Mini is the perfect NES console for video game lovers

    Trending Tags

    • Best iPhone 7 deals
    • Apple Watch 2
    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • iOS 10
    • iPhone 7
    • Sillicon Valley
  • Computers
    tech1

    To regain advertiser trust, Facebook is tracking ads by the millisecond

    news2

    Google has been asked to take down over a million websites

    news3

    Watch Dogs 2 Update Coming This Week, Here’s What It Does

    news5

    Fujifilm X-T2 review: The definition of a great camera

    news1

    How big data analytics help hotels gain customers’ loyalty

    news12

    Brain science startup NeuroQore hopes its magnets will cure depression

  • Applications
    AWS’ transcription platform is now powered by generative AI

    AWS’ transcription platform is now powered by generative AI

    Sports Illustrated reportedly published articles from fake AI authors

    Sports Illustrated reportedly published articles from fake AI authors

    Google’s Bard YouTube extension just got a lot smarter

    Google’s Bard YouTube extension just got a lot smarter

    Read Microsoft’s internal memos about the chaos at OpenAI

    Read Microsoft’s internal memos about the chaos at OpenAI

    Microsoft’s AI-powered Copilot for Windows 10 is now available to test

    Microsoft’s AI-powered Copilot for Windows 10 is now available to test

    Sam Altman returns as CEO OpenAI

    Sam Altman returns as CEO OpenAI

  • Security
    tech1

    To regain advertiser trust, Facebook is tracking ads by the millisecond

    tech3

    National Academy of Sciences endorses embryonic engineering

    news2

    Google has been asked to take down over a million websites

    news3

    Watch Dogs 2 Update Coming This Week, Here’s What It Does

    news8

    The Warby Parker of hair color, Madison Reed, scores new funding and a CMO

    news12

    The Analogue Nt Mini is the perfect NES console for video game lovers

No Result
View All Result
  • Home
  • Review
    tech3

    National Academy of Sciences endorses embryonic engineering

    news3

    Watch Dogs 2 Update Coming This Week, Here’s What It Does

    news5

    Fujifilm X-T2 review: The definition of a great camera

    news12

    The Analogue Nt Mini is the perfect NES console for video game lovers

    tech3

    Using a mind reading device, ‘locked-in’ patients told researchers they’re happy

    news4

    Watch Cruise’s self-driving Bolt EV navigate smoothly to SF’s Dolores Park

  • Gaming
    tech1

    To regain advertiser trust, Facebook is tracking ads by the millisecond

    tech3

    National Academy of Sciences endorses embryonic engineering

    news2

    Google has been asked to take down over a million websites

    news3

    Watch Dogs 2 Update Coming This Week, Here’s What It Does

    news12

    The Analogue Nt Mini is the perfect NES console for video game lovers

    news4

    Watch Cruise’s self-driving Bolt EV navigate smoothly to SF’s Dolores Park

  • Gear
    • All
    • Audio
    • Camera
    • Laptop
    • Smartphone
    tech2

    Apple Watch Series 2 Is Swimproof and Comes With Built-In GPS

    tech3

    National Academy of Sciences endorses embryonic engineering

    news4

    Jack Dorsey says he’ll continue running both Square and Twitter

    news5

    Fujifilm X-T2 review: The definition of a great camera

    news8

    The Warby Parker of hair color, Madison Reed, scores new funding and a CMO

    news12

    The Analogue Nt Mini is the perfect NES console for video game lovers

    Trending Tags

    • Best iPhone 7 deals
    • Apple Watch 2
    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • iOS 10
    • iPhone 7
    • Sillicon Valley
  • Computers
    tech1

    To regain advertiser trust, Facebook is tracking ads by the millisecond

    news2

    Google has been asked to take down over a million websites

    news3

    Watch Dogs 2 Update Coming This Week, Here’s What It Does

    news5

    Fujifilm X-T2 review: The definition of a great camera

    news1

    How big data analytics help hotels gain customers’ loyalty

    news12

    Brain science startup NeuroQore hopes its magnets will cure depression

  • Applications
    AWS’ transcription platform is now powered by generative AI

    AWS’ transcription platform is now powered by generative AI

    Sports Illustrated reportedly published articles from fake AI authors

    Sports Illustrated reportedly published articles from fake AI authors

    Google’s Bard YouTube extension just got a lot smarter

    Google’s Bard YouTube extension just got a lot smarter

    Read Microsoft’s internal memos about the chaos at OpenAI

    Read Microsoft’s internal memos about the chaos at OpenAI

    Microsoft’s AI-powered Copilot for Windows 10 is now available to test

    Microsoft’s AI-powered Copilot for Windows 10 is now available to test

    Sam Altman returns as CEO OpenAI

    Sam Altman returns as CEO OpenAI

  • Security
    tech1

    To regain advertiser trust, Facebook is tracking ads by the millisecond

    tech3

    National Academy of Sciences endorses embryonic engineering

    news2

    Google has been asked to take down over a million websites

    news3

    Watch Dogs 2 Update Coming This Week, Here’s What It Does

    news8

    The Warby Parker of hair color, Madison Reed, scores new funding and a CMO

    news12

    The Analogue Nt Mini is the perfect NES console for video game lovers

No Result
View All Result
ARAMMON NEWS
No Result
View All Result

Arthur unveils Bench, an open-source AI model evaluator

August 17, 2023
Home News
Share on FacebookShare on Twitter

Head over to our on-demand library to view sessions from VB Transform 2023. Register Here


San Francisco-based artificial intelligence (AI) startup Arthur has announced the launch of Arthur Bench, an open-source tool for evaluating and comparing the performance of large language models (LLMs) like OpenAI‘s GPT-3.5 Turbo and Meta’s LLaMA 2.

“With Bench, we’ve created an open-source tool to help teams deeply understand the differences between LLM providers, different prompting and augmentation strategies, and custom training regimes,” said Adam Wenchel, co-founder and CEO of Arthur, in a press release statement.

How Arthur Bench works

Arthur Bench allows companies to test performance of different language models on their specific use cases. It provides metrics to compare models on accuracy, readability, hedging, and other criteria.

For those who have used LLMs on more than a few occasions, “hedging” is an especially noticeable issue — that’s where an LLM provides extraneous language summarizing or alluding to its terms of service, or programming constraints, such as saying “as an AI language model…”, which is typically not germane to a user’s desired response.

Event

VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.

 

Register Now

“Those are kind of some of the subtle differences of behaviors that may be relevant for your particular application,” Wenchel said in an exclusive video interview with VentureBeat.

Screenshot of Arthur Bench comparison of the hedging tendencies in various LLM responses (shown in the table at bottom). Credit: Arthur

Arthur has included a number of starter criteria upon which to compare LLM performance, but because the tool is open source, enterprises using it may add their own criteria to fit their needs.

“You can grab the last 100 questions your users asked and run them against all models. Then Arthur Bench will highlight where answers were wildly different so you can manually review those,” explained Wenchel.

The goal is to help enterprises make informed decisions when adopting AI. Arthur Bench accelerates benchmarking and translates academic measures into real-world business impact.

The company uses a combination of statistical measures and scores, as well as the assessment of other LLMs, to grade the response of desired LLMs side-by-side.

Arthur Bench in action

Wenchel said financial services firms have already been using Arthur Bench to generate investment theses and analysis more quickly.

Vehicle manufacturers have taken their equipment manuals with many pages of highly specific technical guidance and used Arthur Bench to create LLMs that are capable of answering customer queries while sourcing information from said manuals quickly and accurately, while reducing hallucinations.

Another customer, the enterprise media and publishing platform Axios HQ, is also using Arthur Bench on its product development side.

“Arthur Bench helped us develop an internal framework to scale and standardize LLM evaluation across features, and to describe performance to the Product team with meaningful and interpretable metrics,” said Priyanka Oberoi, staff data scientist at Axios HQ.

Arthur is open sourcing Bench so anyone can use and contribute to it for free. The startup believes an open source approach leads to the best products. There will still be opportunities to monetize through team dashboards.

Collaborations with AWS and Cohere

Arthur also announced a hackathon with Amazon Web Services (AWS) and Cohere to encourage developers to build new metrics for Arthur Bench.

Wenchel said AWS’s Bedrock environment for choosing between and deploying a variety of LLMs was “very philosophically aligned” with Arthur Bench.

“How do you rationally decide which LLMs are right for you?” Wenchel said. “This compliments the AWS strategy very well.”

The company launched Arthur Shield earlier this year to monitor large language models for hallucinations and other issues.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link

Tags: ArthurBenchevaluatormodelOpenSourceUnveils
President

President

Next Post
Runway launches new ‘Watch’ feature as CEO says Hollywood AI discourse ‘needs to be more nuanced’ 

Runway launches new 'Watch' feature as CEO says Hollywood AI discourse 'needs to be more nuanced' 

Recommended.

Global VC deals declined in Q3 for the second quarter in a row, hitting 3-year lows

Global VC deals declined in Q3 for the second quarter in a row, hitting 3-year lows

October 4, 2023
1.5k
CrowdStrike defines a strong vision for generative AI at Fal.Con 2023

CrowdStrike defines a strong vision for generative AI at Fal.Con 2023

September 21, 2023
1.5k
ADVERTISEMENT

Trending.

No Content Available
ARAMMON NEWS

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Categories

  • Apple
  • Applications
  • Audio
  • Business
  • Camera
  • Computers
  • Gaming
  • Gear
  • Laptop
  • Microsoft
  • News
  • Photography
  • Review
  • Security
  • Smartphone

Tags

Adobe AI AI ethics AIpowered Altman Amazon Apple Watch 2 Artificial artificial intelligence Best iPhone 7 deals Buying Guides CEO CES 2017 chatbot ChatGPT Cruise cybersecurity Data enterprise Future GenAI generative generative AI Google Human Intelligence iOS 10 iPhone 7 Launches Meta microsoft model models nvidia OpenAI OpenAIs Platform Playstation 4 Pro Releases Sam Sam Altman Search Sillicon Valley tech Unveils

Recent News

Men 2X More Likely To Use Generative AI Than Women: Report

Men 2X More Likely To Use Generative AI Than Women: Report

November 28, 2023
To Beat TikTok, YouTubers Embrace Excess And Instant Gratification

To Beat TikTok, YouTubers Embrace Excess And Instant Gratification

November 28, 2023
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2023 Arammon News - AI NEWS Arammon News.

No Result
View All Result
  • Home
  • Review
  • Apple
  • Applications
  • Computers
  • Gaming
  • Gear
    • Audio
    • Camera
    • Smartphone
  • Microsoft
  • Photography
  • Security

© 2023 Arammon News - AI NEWS Arammon News.