In a prompt injection, an attacker tries to override a model's system prompt or safety rules, for example through instructions like ignore all previous instructions. Especially dangerous is the indirect variant, where malicious commands are hidden in web pages or documents the model processes. Consequences can include data leaks, misinformation, or unwanted actions. Protection comes from guardrails, input validation, and a clear separation of instructions and data.
Prompt Injection
Prompt injection is an attack in which malicious instructions are smuggled into an input to make the AI model deviate from its original directives. It is one of the central security risks in AI applications.
