[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
![[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline [2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://i0.wp.com/arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png?w=1200&resize=1200,700&ssl=1)
[Submitted on 17 Jun 2024 (v1), last revised 14 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more
Reducing Labeling Costs in Sentiment Analysis via Semi-Supervised Learning
![[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline [2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://i0.wp.com/arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png?w=1200&resize=1200,700&ssl=1)
arXiv:2410.11355v1 Announce Kind: cross Summary: Labeling datasets is a noteworthy problem in machine studying, each by way of price and ...
Read more
[2406.11109] Investigating Annotator Bias in Large Language Models for Hate Speech Detection
![[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline [2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://i0.wp.com/arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png?w=1200&resize=1200,700&ssl=1)
[Submitted on 17 Jun 2024 (v1), last revised 12 Oct 2024 (this version, v3)] View a PDF of the paper ...
Read more
A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
![[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline [2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://i0.wp.com/arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png?w=1200&resize=1200,700&ssl=1)
[Submitted on 19 Jun 2024 (v1), last revised 10 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more
LLMs Robustness with Incorrect Multiple-Choice Options
![[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline [2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://i0.wp.com/arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png?w=1200&resize=1200,700&ssl=1)
[Submitted on 27 Aug 2024 (v1), last revised 10 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more
[2402.15048] Unlocking the Power of Large Language Models for Entity Alignment
![[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline [2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://i0.wp.com/arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png?w=1200&resize=1200,700&ssl=1)
[Submitted on 23 Feb 2024 (v1), last revised 9 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more
[2405.18348] Can Automatic Metrics Assess High-Quality Translations?
![[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline [2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://i0.wp.com/arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png?w=1200&resize=1200,700&ssl=1)
[Submitted on 28 May 2024 (v1), last revised 10 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more
Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
![[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline [2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://i0.wp.com/arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png?w=1200&resize=1200,700&ssl=1)
[Submitted on 30 May 2024 (v1), last revised 9 Oct 2024 (this version, v5)] View a PDF of the paper ...
Read more
[2401.15884] Corrective Retrieval Augmented Generation
![[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline [2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://i0.wp.com/arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png?w=1200&resize=1200,700&ssl=1)
[Submitted on 29 Jan 2024 (v1), last revised 7 Oct 2024 (this version, v3)] View a PDF of the paper ...
Read more
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs
![[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline [2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline](https://i0.wp.com/arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png?w=1200&resize=1200,700&ssl=1)
arXiv:2410.05265v1 Announce Kind: cross Summary: Quantization is important for deploying Massive Language Fashions (LLMs) by enhancing reminiscence effectivity and inference ...
Read more