Multi-Agent LLM Defense against Jailbreak Attacks

Multi-Agent LLM Defense against Jailbreak Attacks
[Submitted on 2 Mar 2024 (v1), last revised 14 Nov 2024 (this version, v2)] View a PDF of the paper ...
Read more

[2407.15339] Deep Learning for Economists

Multi-Agent LLM Defense against Jailbreak Attacks
[Submitted on 22 Jul 2024 (v1), last revised 13 Nov 2024 (this version, v3)] View a PDF of the paper ...
Read more

[2411.07820] Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

Multi-Agent LLM Defense against Jailbreak Attacks
[Submitted on 12 Nov 2024 (v1), last revised 13 Nov 2024 (this version, v2)] View a PDF of the paper ...
Read more

[2405.00722] LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study

Multi-Agent LLM Defense against Jailbreak Attacks
[Submitted on 26 Apr 2024 (v1), last revised 12 Nov 2024 (this version, v2)] View a PDF of the paper ...
Read more

[2311.07468] An Analysis and Mitigation of the Reversal Curse

Multi-Agent LLM Defense against Jailbreak Attacks
[Submitted on 13 Nov 2023 (v1), last revised 10 Nov 2024 (this version, v3)] View a PDF of the paper ...
Read more

A Technology Probe for Resolving Value Conflicts through Expert-Driven and User-Driven Strategies in AI Companion Applications

Multi-Agent LLM Defense against Jailbreak Attacks
arXivLabs is a framework that permits collaborators to develop and share new arXiv options straight on our web site. Each ...
Read more

LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions

Multi-Agent LLM Defense against Jailbreak Attacks
arXiv:2411.05025v1 Announce Sort: new Summary: The rise of huge language fashions (LLMs) has led many researchers to think about their ...
Read more

[2406.11944] Transcoders Find Interpretable LLM Feature Circuits

Multi-Agent LLM Defense against Jailbreak Attacks
[Submitted on 17 Jun 2024 (v1), last revised 6 Nov 2024 (this version, v2)] View a PDF of the paper ...
Read more

A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation

Multi-Agent LLM Defense against Jailbreak Attacks
[Submitted on 3 Mar 2021 (v1), last revised 7 Nov 2024 (this version, v3)] View a PDF of the paper ...
Read more

How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis

Multi-Agent LLM Defense against Jailbreak Attacks
arXiv:2411.04105v1 Announce Kind: cross Summary: Giant language fashions (LLMs) have proven wonderful efficiency on duties that require planning and reasoning. ...
Read more