[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
[Submitted on 17 Jun 2024 (v1), last revised 14 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more

Reducing Labeling Costs in Sentiment Analysis via Semi-Supervised Learning

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
arXiv:2410.11355v1 Announce Kind: cross Summary: Labeling datasets is a noteworthy problem in machine studying, each by way of price and ...
Read more

[2406.11109] Investigating Annotator Bias in Large Language Models for Hate Speech Detection

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
[Submitted on 17 Jun 2024 (v1), last revised 12 Oct 2024 (this version, v3)] View a PDF of the paper ...
Read more

A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
[Submitted on 19 Jun 2024 (v1), last revised 10 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more

LLMs Robustness with Incorrect Multiple-Choice Options

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
[Submitted on 27 Aug 2024 (v1), last revised 10 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more

[2402.15048] Unlocking the Power of Large Language Models for Entity Alignment

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
[Submitted on 23 Feb 2024 (v1), last revised 9 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more

[2405.18348] Can Automatic Metrics Assess High-Quality Translations?

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
[Submitted on 28 May 2024 (v1), last revised 10 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more

Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
[Submitted on 30 May 2024 (v1), last revised 9 Oct 2024 (this version, v5)] View a PDF of the paper ...
Read more

[2401.15884] Corrective Retrieval Augmented Generation

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
[Submitted on 29 Jan 2024 (v1), last revised 7 Oct 2024 (this version, v3)] View a PDF of the paper ...
Read more

PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
arXiv:2410.05265v1 Announce Kind: cross Summary: Quantization is important for deploying Massive Language Fashions (LLMs) by enhancing reminiscence effectivity and inference ...
Read more