Policy Outputs
Privacy (PII)
redacted_text
- Text with specified entity/regex/blocked types redactedredacted_entities
- Dict mapping each redacted type to a dict mapping each uniquely redacted entity to a list of the entities it replacedredacted_entity_positions
- List of tuples containing unique redacted entity and the span positions in the plaintext it refers to.
Example:
{
"redacted_entities": {
"LOC": {
"<LOC_1>": [
"US"
]
}
},
"redacted_entity_positions": [
[
"<LOC_1>",
19,
26
]
],
"redacted_text": "Who is the current <LOC_1> president?"
}
Toxicity
classification
- Classifcation (safe or unsafe)reason
- Reason
Hallucination
avg_entailment_probability
- Entailment probabiltiy. Higher is better
RAG Hallucination
retrieval_relevance
- Probability representing how relevant the retrieved context is to the user prompt. Higher is betterresponse_faithfulness
- Probability representing how relevant the model response is to the user prompt. Higher is betterresponse_relevance
- Probability representing how relevant the model response is to the retrieved context. Higher is better
Content (Alignment)
guard_classification
- Classification (safe/unsafe) the guardrail model gave to the queryguard_rationale
- Rationale for classificationviolated
- Boolean indicating whether this policy was violated