• About
  • Advertise
  • Privacy & Policy
  • Contact
Saturday, January 10, 2026
  • Login
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Emerging Technologies

Anthropic Unveils the Strongest Defense Against AI Jailbreaks Yet

AiNEWS2025 by AiNEWS2025
2025-02-08
in Emerging Technologies
0
Anthropic Unveils the Strongest Defense Against AI Jailbreaks Yet
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Despite considerable efforts to prevent AI chatbots from providing harmful responses, they’re vulnerable to jailbreak prompts that sidestep safety mechanisms. Anthropic has now unveiled the strongest protection against these kinds of attacks to date.

One of the greatest strengths of large language models is their generality. This makes it possible to apply them to a wide range of natural language tasks from translator to research assistant to writing coach.

But this also makes it hard to predict how people will exploit them. Experts worry they could be used for a variety of harmful tasks, such as generating misinformation, automating hacking workflows, or even helping people build bombs, dangerous chemicals, or bioweapons.

AI companies go to great lengths to prevent their models from producing this kind of material—training the algorithms with human feedback to avoid harmful outputs, implementing filters for malicious prompts, and enlisting hackers to circumvent defenses so the holes can be patched.

Yet most models are still vulnerable to so-called jailbreaks—inputs designed to sidestep these protections. Jailbreaks can be accomplished with unusual formatting, such as random capitalization, swapping letters for numbers, or asking the model to adopt certain personas that ignore restrictions.

Now though, Anthropic says it’s developed a new approach that provides the strongest protection against these attacks so far. To prove its effectiveness, the company offered hackers a $15,000 prize to crack the system. No one claimed the prize, despite people spending 3,000 hours trying.

The technique involves training filters that both block malicious prompts and detect when the model is outputting harmful material. To do this, the company created what it calls a constitution. This is a list of principles governing the kinds of responses the model is allowed to produce.

In research outlined in a non-peer-reviewed paper posted to arXiv, the company created a constitution to prevent the model from generating content that could aid in the building of chemical weapons. The constitution was then fed into the company’s Claude chatbot to produce a large number of prompts and responses covering both acceptable and unacceptable topics.

The responses were then used to fine-tune two instances of the company’s smallest AI model Claude Haiku—one to filter out inappropriate prompts and another to filter out harmful responses. The output filter operates in real-time as a response is generated, allowing the filter to cut off the output partway through if it detects that it’s heading in a harmful direction.

They used these filters to protect the company’s larger Claude Sonnet model as it responded to prompts from 183 participants in a red-teaming hacking competition. Participants tried to find a universal jailbreak—a technique to bypass all the model’s defenses. To succeed, they had to get the model to answer every one of 10 forbidden queries, something none of them achieved.

To further evaluate the approach, the researchers used another large language model to generate 10,000 synthetic jailbreaking prompts, including ones deliberately designed to work around the new safety features. They then subjected two versions of Claude Sonnet to these jailbreaking prompts, one protected by the new filter and one that wasn’t. The vanilla version of Claude responded to 86 percent of the prompts, but the one protected by the new system only responded to 4.4 percent.

One downside of these kinds of filters is they may block legitimate prompts, but the researchers found the refusal rate only increased by 0.38 percent. The filter did lead to a 23.7 percent increase in compute costs, however, which could be significant in commercial deployments.

It’s also important to remember that although the approach significantly improved defenses against universal prompts that could crack all 10 forbidden queries, many individual queries did slip through. Nonetheless, the researchers say the lack of universal jailbreaks makes their filters much harder to get past. They also suggest they should be used in conjunction with other techniques.

“While these results are promising, common wisdom suggests that system vulnerabilities will likely emerge with continued testing,” they write. “Responsibly deploying advanced AI models with scientific capabilities will thus require complementary defenses.”

Building these kinds of defenses is always a cat-and-mouse game with attackers, so this is unlikely to be the last word in AI safety. But the discovery of a much more reliable way to constrain harmful outputs is likely to significantly increase the number of areas in which AI can be safely deployed.

Source link

#Anthropic #Unveils #Strongest #Defense #Jailbreaks

Previous Post

A Flawed Inversion of SUPERHOT

Next Post

BlinkRx, Tidal Vision Lead Another Slow week

AiNEWS2025

AiNEWS2025

Next Post
BlinkRx, Tidal Vision Lead Another Slow week

BlinkRx, Tidal Vision Lead Another Slow week

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
Best 5 AI semantic reasoning tools for databases

Best 5 AI semantic reasoning tools for databases

2026-01-10
What new legal challenges mean for the future of US offshore wind

What new legal challenges mean for the future of US offshore wind

2026-01-10
Data Science Spotlight: Selected Problems from Advent of Code 2025

Data Science Spotlight: Selected Problems from Advent of Code 2025

2026-01-10
SpaceX gets FCC permission to launch another 7,500 Starlink satellites

SpaceX gets FCC permission to launch another 7,500 Starlink satellites

2026-01-10

Recent News

Best 5 AI semantic reasoning tools for databases

Best 5 AI semantic reasoning tools for databases

2026-01-10
What new legal challenges mean for the future of US offshore wind

What new legal challenges mean for the future of US offshore wind

2026-01-10
Data Science Spotlight: Selected Problems from Advent of Code 2025

Data Science Spotlight: Selected Problems from Advent of Code 2025

2026-01-10
SpaceX gets FCC permission to launch another 7,500 Starlink satellites

SpaceX gets FCC permission to launch another 7,500 Starlink satellites

2026-01-10
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

Best 5 AI semantic reasoning tools for databases

Best 5 AI semantic reasoning tools for databases

2026-01-10
What new legal challenges mean for the future of US offshore wind

What new legal challenges mean for the future of US offshore wind

2026-01-10
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.