This is the LLM attack where the user’s input directly tries to override or redirect the model’s intended instructions.
What is Direct Prompt Injection?
This is prompt injection delivered through external content the model reads (documents, websites, emails) rather than the user typing it directly.
What is Indirect Prompt Injection?
The DVAIA lab theme where an AI agent has too many permissions and can be manipulated into doing things a user shouldn’t be able to trigger.
What is excessive agency?
This happens when model output is rendered as HTML in the browser instead of being escaped as text.
What is cross-site scripting (XSS)?
OWASP warns LLMs can reveal confidential data such as PII and API keys in responses; this is called:
What is sensitive data disclosure?
This technique attempts to bypass guardrails by rewriting the request using another language (or switching languages mid-stream).
What is a multi-language prompt injection?
Instead of attacking via the chat box, the attacker hides instructions inside data the model reads, like uploaded documents used for retrieval.
What is RAG document poisoning?
The most fundamental mitigation principle for reducing agent harm: grant only the minimum permissions necessary to do the job.
What is least privilege?
XSS where the malicious content appears immediately in the response because it came directly from the current request.
What is reflected XSS?
This is a “non-obvious” type of sensitive information that can leak: internal decision rules, workflows, or “how the app works” (often valuable to attackers).
What is business logic?
A technique where the attacker tells the model to “pretend” it’s a different persona (e.g., an admin, a dev, or a security tool) to bypass restrictions.
What is role playing?
In the DVAIA workshop scenario, this assistant agent is compromised when hidden instructions are embedded in message content it tries to summarize or respond to.
What is email agent injection?
This security control, called _____-in-the-loop approval, helps prevent “do it because the model said so” by requiring explicit approvals before sensitive actions.
What is human-in-the-loop approval?
XSS where the malicious content is saved (for example, into a profile field) and later re-displayed to trigger again.
What is stored XSS?
The best general defense pattern is minimizing what the model can ever see: don’t embed secrets in prompts, and limit access to only what’s needed—commonly summarized as:
What is data minimization + least privilege?
This direct prompt-injection goal tries to reveal the model’s hidden instructions or configuration text.
What is system prompt extraction?
Clue: The most important defense concept here is to treat retrieved/email/document text as ___, not commands.
What is data?
A dangerous sign: the agent can access “hidden” tools that normal users shouldn’t have, especially without access controls. What are _____ _____ tools?
What are exposed admin tools (unauthorized tool access)?
In the lab, an XSS payload that runs immediately in the current response is categorized as this type.
What is reflected XSS?
This vulnerability class happens when a system can be tricked into retrieving internal resources (often cloud or internal endpoints) instead of only intended external content.
What is SSRF (Server-Side Request Forgery)?
In the workshop, this umbrella concept covers attempts to break safety boundaries outright (often framed as “ignore prior rules and comply”).
What is a jailbreak?
Name the testing guidance number from OWASP LLM Top 10 2025 that describes a flow where a plugin/tool retrieves external content and the system merges it into the model prompt, causing unintended behavior — this is the “indirect” threat scenario for:
What is LLM01 Indirect Prompt Injection?
The risk pattern where the model is allowed to both decide and execute high-impact actions (like sending, deleting, transferring, or disclosing) with no guardrails. What is o___-p________ autonomous action (unsafe tool orchestration)?
What is over-permissioned autonomous action?
According to the OWASP AI Testing Guide, this failure allows model‑generated JavaScript to execute in the browser when output handling does not properly encode content. What is i_____ o_____ h____ leading to XSS?
What is improper output handling leading to XSS?
The “Vulnerable vs Fixed” toggle in Sensitive Data Disclosure is used to compare attack vs defense, including that “traversal [is] blocked” in fixed mode — that suggests defenses like validation/sanitization of:
What are file paths / resource identifiers (inputs that could enable path traversal)?