An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
[Submitted on 13 Jun 2024 (v1), last revised 4 Feb 2025 (this version, v2)] View a PDF of the paper ...
Read more

Exploring the Role of Punctuation in Semantic Processing

An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
[Submitted on 10 Jan 2025 (v1), last revised 2 Feb 2025 (this version, v3)] View a PDF of the paper ...
Read more

[2501.07927] Gandalf the Red: Adaptive Security for LLMs

An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
[Submitted on 14 Jan 2025 (v1), last revised 2 Feb 2025 (this version, v2)] Authors:Niklas Pfister, Václav Volhejn, Manuel Knott, ...
Read more

Language Bias in Self-Supervised Learning For Automatic Speech Recognition

An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
arXiv:2501.19321v1 Announce Type: cross Abstract: Self-supervised learning (SSL) is used in deep learning to train on large datasets without the ...
Read more

Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking

An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
[Submitted on 28 Jan 2025 (v1), last revised 30 Jan 2025 (this version, v2)] View a PDF of the paper ...
Read more

DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance

An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
arXiv:2501.17479v1 Announce Type: cross Abstract: Large Language Models (LLMs) have shown remarkable capabilities across various natural language processing tasks but ...
Read more

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation

An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
arXiv:2501.17433v1 Announce Type: cross Abstract: Recent research shows that Large Language Models (LLMs) are vulnerable to harmful fine-tuning attacks — ...
Read more

Direct Schema Linking via Question Enrichment in Text-to-SQL

An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
[Submitted on 25 Sep 2024 (v1), last revised 28 Jan 2025 (this version, v2)] View a PDF of the paper ...
Read more

A Multimodal Textbook for Vision-Language Pretraining

An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
[Submitted on 1 Jan 2025 (v1), last revised 27 Jan 2025 (this version, v3)] View a PDF of the paper ...
Read more

Zero-Shot Decision Tree Construction via Large Language Models

An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
arXiv:2501.16247v1 Announce Type: cross Abstract: This paper introduces a novel algorithm for constructing decision trees using large language models (LLMs) ...
Read more