Investigating Temporal Vulnerabilities in LLMs

[Submitted on 4 Jul 2024 (v1), last revised 23 Dec 2024 (this version, v3)] View a PDF of the paper ...
Read more
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents

arXiv:2412.16682v1 Announce Type: cross Abstract: Large Language Model (LLM) agents are increasingly being deployed as conversational assistants capable of performing ...
Read more
MORTAR: Metamorphic Multi-turn Testing for LLM-based Dialogue Systems

arXiv:2412.15557v1 Announce Type: cross Abstract: With the widespread application of LLM-based dialogue systems in daily life, quality assurance has become ...
Read more
[2403.17196] Text Understanding in GPT-4 vs Humans

[Submitted on 25 Mar 2024 (v1), last revised 20 Dec 2024 (this version, v3)] View a PDF of the paper ...
Read more
A Rate-Distortion Framework for Black-Box Language Models

[Submitted on 22 Jul 2024 (v1), last revised 11 Dec 2024 (this version, v2)] View a PDF of the paper ...
Read more
Benchmarking and Enhancing Multimodal Models on Visual Illusions

[Submitted on 11 Dec 2024] View a PDF of the paper titled Illusory VQA: Benchmarking and Enhancing Multimodal Models on ...
Read more
Simulating Legislative System for Roll Call Votes Prediction with Large Language Models

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals ...
Read more
Extreme Context Compression for Retrieval-augmented Generation with One Token

[Submitted on 22 May 2024 (v1), last revised 9 Dec 2024 (this version, v2)] View a PDF of the paper ...
Read more
[2404.02657] Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

[Submitted on 3 Apr 2024 (v1), last revised 8 Dec 2024 (this version, v4)] View a PDF of the paper ...
Read more
Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models

arXiv:2412.05167v1 Announce Type: cross Abstract: Large Audio-Language Models (LALMs) have unclocked audio dialogue capabilities, where audio dialogues are a direct ...
Read more