Sunday, January 11, 2026

Login

Home
News
Hillary Clinton in white pantsuit for Trump inauguration

Amazon has 143 billion reasons to keep adding more perks to Prime

Shooting More than 40 Years of New York’s Halloween Parade

These Are the 5 Big Tech Stories to Watch in 2017

Why Millennials Need to Save Twice as Much as Boomers Did

Doctors take inspiration from online dating to build organ transplant AI
Trending Tags
Tech
- All
- Apps
- Gadget
- Mobile
- Startup
The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

Shadow Tactics: Blades of the Shogun Review

macOS Sierra review: Mac users get a modest update this year

Hands on: Samsung Galaxy A5 2017 review

The Last Guardian Playstation 4 Game review

These Are the 5 Big Tech Stories to Watch in 2017
Trending Tags
Entertainment
- All
- Gaming
- Movie
- Music
- Sports
The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

macOS Sierra review: Mac users get a modest update this year

Hands on: Samsung Galaxy A5 2017 review

Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

Harnessing the power of VR with Power Rangers and Snapdragon 835

So you want to be a startup investor? Here are things you should know
Lifestyle
- All
- Fashion
- Food
- Health
- Travel
Shooting More than 40 Years of New York’s Halloween Parade

Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

Why Millennials Need to Save Twice as Much as Boomers Did

Doctors take inspiration from online dating to build organ transplant AI

How couples can solve lighting disagreements for good

Ducati launch: Lorenzo and Dovizioso’s Desmosedici
Trending Tags
Review

The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

Shadow Tactics: Blades of the Shogun Review

macOS Sierra review: Mac users get a modest update this year

Hands on: Samsung Galaxy A5 2017 review

The Last Guardian Playstation 4 Game review

Intel Core i7-7700K ‘Kaby Lake’ review

No Result

View All Result

Ai News

No Result

View All Result

No Result

View All Result

Home AI & Sentiment Analysis

[2411.11496] Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models

by AiNEWS2025

in AI & Sentiment Analysis

0

SHARES

0

VIEWS

Share on Facebook Share on Twitter

[Submitted on 18 Nov 2024 (v1), last revised 28 Nov 2024 (this version, v3)]

View a PDF of the paper titled Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models, by Chenhang Cui and 7 other authors

View PDF
HTML (experimental)

Abstract:Recent advances in Large Vision-Language Models (LVLMs) have showcased strong reasoning abilities across multiple modalities, achieving significant breakthroughs in various real-world applications. Despite this great success, the safety guardrail of LVLMs may not cover the unforeseen domains introduced by the visual modality. Existing studies primarily focus on eliciting LVLMs to generate harmful responses via carefully crafted image-based jailbreaks designed to bypass alignment defenses. In this study, we reveal that a safe image can be exploited to achieve the same jailbreak consequence when combined with additional safe images and prompts. This stems from two fundamental properties of LVLMs: universal reasoning capabilities and safety snowball effect. Building on these insights, we propose Safety Snowball Agent (SSA), a novel agent-based framework leveraging agents’ autonomous and tool-using abilities to jailbreak LVLMs. SSA operates through two principal stages: (1) initial response generation, where tools generate or retrieve jailbreak images based on potential harmful intents, and (2) harmful snowballing, where refined subsequent prompts induce progressively harmful outputs. Our experiments demonstrate that \ours can use nearly any image to induce LVLMs to produce unsafe content, achieving high success jailbreaking rates against the latest LVLMs. Unlike prior works that exploit alignment flaws, \ours leverages the inherent properties of LVLMs, presenting a profound challenge for enforcing safety in generative multimodal systems. Our code is avaliable at \url{this https URL}.

Submission history

From: Chenhang Cui [view email]
[v1]
Mon, 18 Nov 2024 11:58:07 UTC (4,794 KB)
[v2]
Tue, 19 Nov 2024 03:01:43 UTC (4,794 KB)
[v3]
Thu, 28 Nov 2024 02:07:46 UTC (5,247 KB)

Source link

#Safe #Safe #Unsafe #Exploring #Safe #Images #Exploited #Jailbreak #Large #VisionLanguage #Models

Nintendo subpoenas Google, Discord, Reddit and more in ongoing Switch pirate crackdown

Can AI help in curbing efficiency of cyber attacks

AiNEWS2025

Next Post

Can AI help in curbing efficiency of cyber attacks

Can AI help in curbing efficiency of cyber attacks

No Result

View All Result

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.