Benchmarking LLM Proficiency in Scientific Literature Analysis

Benchmarking LLM Proficiency in Scientific Literature Analysis
[Submitted on 4 Mar 2024 (v1), last revised 18 Oct 2024 (this version, v5)] Authors:Hengxing Cai, Xiaochen Cai, Junhan Chang, ...
Read more

Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

Benchmarking LLM Proficiency in Scientific Literature Analysis
[Submitted on 16 Jul 2024 (v1), last revised 17 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more

Lightweight Passage Retrieval for Open Domain Multi-Document Summarization

Benchmarking LLM Proficiency in Scientific Literature Analysis
[Submitted on 18 Jun 2024 (v1), last revised 17 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more

[2310.20246] Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations

Benchmarking LLM Proficiency in Scientific Literature Analysis
[Submitted on 31 Oct 2023 (v1), last revised 16 Oct 2024 (this version, v5)] View a PDF of the paper ...
Read more

[2406.11939] From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Benchmarking LLM Proficiency in Scientific Literature Analysis
[Submitted on 17 Jun 2024 (v1), last revised 14 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more

Reducing Labeling Costs in Sentiment Analysis via Semi-Supervised Learning

Benchmarking LLM Proficiency in Scientific Literature Analysis
arXiv:2410.11355v1 Announce Kind: cross Summary: Labeling datasets is a noteworthy problem in machine studying, each by way of price and ...
Read more

[2406.11109] Investigating Annotator Bias in Large Language Models for Hate Speech Detection

Benchmarking LLM Proficiency in Scientific Literature Analysis
[Submitted on 17 Jun 2024 (v1), last revised 12 Oct 2024 (this version, v3)] View a PDF of the paper ...
Read more

A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs

Benchmarking LLM Proficiency in Scientific Literature Analysis
[Submitted on 19 Jun 2024 (v1), last revised 10 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more

LLMs Robustness with Incorrect Multiple-Choice Options

Benchmarking LLM Proficiency in Scientific Literature Analysis
[Submitted on 27 Aug 2024 (v1), last revised 10 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more

[2402.15048] Unlocking the Power of Large Language Models for Entity Alignment

Benchmarking LLM Proficiency in Scientific Literature Analysis
[Submitted on 23 Feb 2024 (v1), last revised 9 Oct 2024 (this version, v2)] View a PDF of the paper ...
Read more