Large language models are supposed to shut down when users ask for dangerous help, from building weapons to writing malware.
frontier proprietary and open-weight models yielded high attack success rates when prompted in verse, indicating a deeper, ...
Research from Italy’s Icaro Lab found that poetry can be used to jailbreak AI and skirt safety protections.
A team of researchers found prompts that are so effective at tricking AI models that they're keeping them under wraps.
Across 25 state-of-the-art models, poetic prompts achieved an average “attack success rate” of 62% for handcrafted poems and ...
Wow! It feels so long since our last Wednesday Poetry Prompt (almost like another time and place), but it's only been a little over a month. So weird how time works at times. For this week's prompt, ...
For this week's prompt, write an appraisal poem. Of course, people are used to concepts such as home appraisals and appraising jewelry. However, the poem could be a self-appraisal, or appraise ...
Riddle-like poems tricked chatbots into spewing hate speech and helping design nuclear weapons and nerve agents. It turns out ...
Growing concerns around AI safety have intensified as new research uncovers unexpected weaknesses in leading language models.
Poetry -based prompts can bypass safety features in AI models like ChatGPT to obtain instructions for creating malware or ...