A Survey of Tasks, Datasets, Models, and Challenges

A Survey of Tasks, Datasets, Models, and Challenges
[Submitted on 25 Oct 2024 (v1), last revised 30 Jul 2025 (this version, v3)] View a PDF of the paper ...
Read more

Navigating Ideologies of Large Language Models

A Survey of Tasks, Datasets, Models, and Challenges
[Submitted on 24 Jun 2025 (v1), last revised 29 Jul 2025 (this version, v2)] View a PDF of the paper ...
Read more

Dissecting Persona-Driven Reasoning in Language Models via Activation Patching

A Survey of Tasks, Datasets, Models, and Challenges
arXiv:2507.20936v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit remarkable versatility in adopting diverse personas. In this study, we ...
Read more

Unveiling the reasoning behaviour of medical Large Language Models

A Survey of Tasks, Datasets, Models, and Challenges
[Submitted on 20 Dec 2024 (v1), last revised 28 Jul 2025 (this version, v2)] View a PDF of the paper ...
Read more

[2412.13666] Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation

A Survey of Tasks, Datasets, Models, and Challenges
[Submitted on 18 Dec 2024 (v1), last revised 25 Jul 2025 (this version, v2)] View a PDF of the paper ...
Read more

Dynamic and Generalizable Process Reward Modeling

A Survey of Tasks, Datasets, Models, and Challenges
arXiv:2507.17849v1 Announce Type: new Abstract: Process Reward Models (PRMs) are crucial for guiding Large Language Models (LLMs) in complex scenarios ...
Read more

[2505.22334] Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

A Survey of Tasks, Datasets, Models, and Challenges
[Submitted on 28 May 2025 (v1), last revised 23 Jul 2025 (this version, v2)] View a PDF of the paper ...
Read more

[2503.03460] Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models

A Survey of Tasks, Datasets, Models, and Challenges
[Submitted on 5 Mar 2025 (v1), last revised 23 Jul 2025 (this version, v2)] View a PDF of the paper ...
Read more

[2507.15844] Hierarchical Budget Policy Optimization for Adaptive Reasoning

A Survey of Tasks, Datasets, Models, and Challenges
[Submitted on 21 Jul 2025 (v1), last revised 22 Jul 2025 (this version, v2)] View a PDF of the paper ...
Read more

[2507.15007] Hear Your Code Fail, Voice-Assisted Debugging for Python

A Survey of Tasks, Datasets, Models, and Challenges
[Submitted on 20 Jul 2025 (v1), last revised 22 Jul 2025 (this version, v2)] View a PDF of the paper ...
Read more