Skip to main content

Prompt Injection Protection

Overview

The prompt injection policy can be used to detect user prompts that aim to attack or jailbreak the AI system.

Prompt Injection Policy Definition

DynamoAI uses the following criteria to determinine if a prompt is a prompt injection attack:

  • Disallowed:
    • Any malicious or inappropriate content
    • Any jailbreaking attempt
    • Instructions to ignore or override safety and constraints
    • Demands for illegal, inappropriate, or harmful content
    • Guidance on phishing, hacking, or other adversarial attacks
    • Deceptive impersonation of authorities or trusted entities
    • Exploiting vulnerabilities or bugs
    • Overwhelming with excessive or irrelevant data to cause confusion or errors
    • Subtle attempts to erode safety and ethics
    • Emotional manipulation or appeal to bypass safety
    • Gaslighting or psychological tricks to cast doubt on safety
  • Allowed:
    • Any non-malicious query
    • General questions free of malicious intent
    • Roleplaying, provided the output remains harmless and responsible
    • Questions about risks and prevention of malicious attacks
    • Discussions about AI safety and security best practices

Prompt Injection Policy Actions

You can manage what happens to inputs and outputs when applying the prompt injection policy using the actions below:

  • Flag: flag content for moderator review
  • Block: block user inputs or model outputs containing unsafe content