...

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline


View a PDF of the paper titled From Crowdsourced Information to Excessive-High quality Benchmarks: Area-Laborious and BenchBuilder Pipeline, by Tianle Li and seven different authors

View PDF
HTML (experimental)

Summary:The fast evolution of Massive Language Fashions (LLMs) has outpaced the event of mannequin analysis, highlighting the necessity for steady curation of recent, difficult benchmarks. Nonetheless, guide curation of high-quality, human-aligned benchmarks is pricey and time-consuming. To handle this, we introduce BenchBuilder, an automatic pipeline that leverages LLMs to curate high-quality, open-ended prompts from giant, crowd-sourced datasets, enabling steady benchmark updates with out human within the loop. We apply BenchBuilder to datasets reminiscent of Chatbot Area and WildChat-1M, extracting difficult prompts and using LLM-as-a-Choose for computerized mannequin analysis. To validate benchmark high quality, we suggest new metrics to measure a benchmark’s alignment with human preferences and skill to separate fashions. We launch Area-Laborious-Auto, a benchmark consisting 500 difficult prompts curated by BenchBuilder. Area-Laborious-Auto gives 3x larger separation of mannequin performances in comparison with MT-Bench and achieves 98.6% correlation with human desire rankings, all at a value of $20. Our work units a brand new framework for the scalable curation of automated benchmarks from in depth knowledge.

Submission historical past

From: Tianle Li [view email]
[v1]
Mon, 17 Jun 2024 17:26:10 UTC (1,870 KB)
[v2]
Mon, 14 Oct 2024 18:11:58 UTC (1,977 KB)

Source link

#Crowdsourced #Information #HighQuality #Benchmarks #ArenaHard #BenchBuilder #Pipeline


Unlock the potential of cutting-edge AI options with our complete choices. As a number one supplier within the AI panorama, we harness the ability of synthetic intelligence to revolutionize industries. From machine studying and knowledge analytics to pure language processing and pc imaginative and prescient, our AI options are designed to reinforce effectivity and drive innovation. Discover the limitless potentialities of AI-driven insights and automation that propel your small business ahead. With a dedication to staying on the forefront of the quickly evolving AI market, we ship tailor-made options that meet your particular wants. Be a part of us on the forefront of technological development, and let AI redefine the best way you use and reach a aggressive panorama. Embrace the long run with AI excellence, the place potentialities are limitless, and competitors is surpassed.