Schneier - Indirect Instruction Injection in Multi-Modal LLMs

July 28, 2023

Interesting research: “(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs“:

Abstract: We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker’s instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.

from Schneier on Security https://www.schneier.com/blog/archives/2023/07/indirect-instruction-injection-in-multi-modal-llms.html

Search This Blog

BuzzSec

Schneier - Indirect Instruction Injection in Multi-Modal LLMs

Comments

Post a Comment

Popular posts from this blog

KnowBe4 - Scam Of The Week: "When Users Add Their Names to a Wall of Shame"

KnowBe4 - Uncovering the Sophisticated Phishing Campaign Bypassing M365 MFA

The Hacker News - ⚡ Weekly Recap: WhatsApp 0-Day, Docker Bug, Salesforce Breach, Fake CAPTCHAs, Spyware App & More