Policy Types

Privacy (PII)

These policies identify sensitive information in a given string. You can customize the entity types to target, specific regex expressions, and explicit words to block. A PII policy outputs a list of detected PII and optionally redacts text.

Toxicity

These policies determine whether a given string is considered toxic. It outputs a classification (safe/unsafe) of the given text and a rationale for why.

Hallucination

These policies determine whether a given model response entails the user prompt. It outputs a score between 0 and 1 representing the entailment probability. Naturally. these can only be ran on model responses.

RAG Hallucination

These policies determine how relevant the user prompt, retrieved context, and model response are to each other. It outputs three probabilities between 0 and 1 representing retrieval relevance, response relevance, and response faithfulness. Naturally. these can only be ran on model responses where the user message has RAG context.

Content (Alignment)

These policies determine whether a given string violates a constitution or taxonomy. These are best created through our Guardrail Development Platform.

Privacy (PII)​

Toxicity​

Hallucination​

RAG Hallucination​

Content (Alignment)​

Privacy (PII)

Toxicity

Hallucination

RAG Hallucination

Content (Alignment)