Security Guide
This guide explains how to use the SDK's Security module to implement content moderation, prompt shielding, and PII (Personally Identifiable Information) detection for agent inputs and outputs.
Overview
The Security module provides content safety check functionality supporting 3 cloud providers.
| Provider | Content Moderation | Prompt Shield | PII Detection |
|---|---|---|---|
| Azure | Azure AI Content Safety | Prompt Shield API | PII Detection API |
| AWS | Bedrock Guardrails | Bedrock Guardrails | Amazon Comprehend |
| GCP | Model Armor | Model Armor | Cloud DLP |
User Input
|
v
+------------------------+
| SecurityClientBase | <- Execute all at once with check_security()
+--------+---------------+
|
+----+----+
v v
Content PII
Moderation Detection
| |
+----+----+
|
allowed / blocked decision
|
v
Agent processing or block
Provider Configuration
- Azure
- AWS
- GCP
[security]
provider = "azure"
content_safety_endpoint = "https://your-content-safety.cognitiveservices.azure.com/"
content_safety_api_key = "${AZURE_CONTENT_SAFETY_KEY}"
language_endpoint = "https://your-language.cognitiveservices.azure.com/"
language_api_key = "${AZURE_LANGUAGE_KEY}"
moderation_threshold = 2
pii_confidence_threshold = 0.7
from agenticstar_platform.security import AzureSecurityClient, AzureSecurityConfig
config = AzureSecurityConfig(
content_safety_endpoint="https://your-content-safety.cognitiveservices.azure.com/",
content_safety_api_key="your-api-key",
language_endpoint="https://your-language.cognitiveservices.azure.com/",
language_api_key="your-language-key",
)
client = AzureSecurityClient(config)
[security]
provider = "aws"
region_name = "us-east-1"
guardrail_id = "your-guardrail-id"
guardrail_version = "DRAFT"
from agenticstar_platform.security import AWSSecurityClient, AWSSecurityConfig
config = AWSSecurityConfig(
region_name="us-east-1",
guardrail_id="your-guardrail-id",
)
client = AWSSecurityClient(config)
[security]
provider = "gcp"
project_id = "your-project"
model_armor_template = "your-armor-template"
model_armor_region = "us-central1"
dlp_location = "global"
from agenticstar_platform.security import GCPSecurityClient, GCPSecurityConfig
config = GCPSecurityConfig(
project_id="your-project",
model_armor_template="your-armor-template",
model_armor_region="us-central1",
)
client = GCPSecurityClient(config)
Auto-Detection Factory
You can automatically detect the provider from a configuration dictionary and generate a client.
from agenticstar_platform.security import create_security_client
# Auto-detect provider from config dictionary
client = create_security_client(config_dict)
# Generate PII detection-only client
from agenticstar_platform.security import create_pii_client
pii_client = create_pii_client(config_dict)
Content Moderation
Detect harmful content in user inputs and agent outputs.
result = await client.check_content_moderation(
text="Text to check",
threshold=2, # severity 0-6, blocks at or above this threshold
)
print(result.blocked) # True (blocked) / False (allowed)
print(result.categories) # {ContentCategory.HATE: 0, ContentCategory.VIOLENCE: 4, ...}
print(result.threshold) # Applied threshold
Detection Categories and Severity
| ContentCategory | Description | Severity (0-6) |
|---|---|---|
HATE | Hate speech / discrimination | 0=safe, 2=low, 4=medium, 6=high |
SEXUAL | Sexual content | Same as above |
SELF_HARM | Self-harm | Same as above |
VIOLENCE | Violent content | Same as above |
PROFANITY | Inappropriate language | Same as above |
INSULT | Insulting language | Same as above |
THREAT | Threatening language | Same as above |
Prompt Shield
Detect prompt injection attacks.
result = await client.check_prompt_shield(
user_prompt="User input text",
documents=["Reference document 1", "Reference document 2"],
)
print(result.attack_detected) # True / False
print(result.attack_type) # Attack type (when detected)
print(result.confidence) # Confidence score
check_prompt_shield() automatically trims each document to 10,000 characters (to comply with Azure API limits).
PII Detection
Detect personally identifiable information in text. Supports Japanese PII (phone numbers, email addresses, etc.) using regular expressions.
result = await client.detect_pii(
text="Taro Tanaka's phone number is 090-1234-5678. Email is tanaka@example.com.",
mask=True,
language="ja",
)
print(result.success) # True
print(result.masked_text) # "Taro Tanaka's phone number is ***. Email is ***."
print(result.categories_detected) # [PIICategory.PHONE_NUMBER, PIICategory.EMAIL]
for entity in result.entities:
print(f" {entity.category}: {entity.text} (offset={entity.offset}, len={entity.length}, confidence={entity.confidence})")
PII Categories (Major)
| PIICategory | Detection Examples |
|---|---|
PERSON_NAME | Taro Tanaka |
PHONE_NUMBER | 090-1234-5678, 03-1234-5678 |
EMAIL | user@example.com |
CREDIT_CARD | 4111-1111-1111-1111 |
NATIONAL_ID | My Number (12 digits) |
PASSPORT_NUMBER | TK1234567 |
ADDRESS | Shibuya-ku, Tokyo... |
IP_ADDRESS | 192.168.1.1 |
API_KEY | sk-... |
CONNECTION_STRING | DefaultEndpointsProtocol=... |
Integrated Security Check
The check_security() method allows you to run content moderation, prompt shield, and PII detection all at once.
result = await client.check_security(
text="Text to check",
check_moderation=True,
check_prompt_shield=True,
check_pii=False, # Default is False
fail_on_error=True,
)
print(result.allowed) # True (allowed) / False (blocked)
print(result.violations) # ["content_moderation: violence detected", ...]
print(result.content_moderation) # ContentModerationResult or None
print(result.prompt_shield) # PromptShieldResult or None
print(result.pii_detection) # PIIDetectionResult or None
ContentSafetyValidator
ContentSafetyValidator is a wrapper class that validates content safety in bulk based on guardrail settings.
from agenticstar_platform.security import ContentSafetyValidator, GuardrailsConfig
validator = ContentSafetyValidator(
security_client=client,
guardrails=guardrails_config, # Guardrail settings retrieved from DB
)
# Validate content
result = await validator.validate(user_message)
print(result.allowed)
print(result.violations)
Usage Patterns in Agents
Input and Output Checking
async def safe_agent_run(user_message: str) -> str:
# 1. Input check
input_result = await client.check_security(text=user_message)
if not input_result.allowed:
return f"Request blocked: {', '.join(input_result.violations)}"
# 2. Agent processing
response = await run_agent_logic(user_message)
# 3. Output check
output_result = await client.check_security(text=response)
if not output_result.allowed:
return "A content policy violation was detected during response generation."
return response
Next Steps
SDK API Reference — Security Module
Complete specifications for SecurityClient / ContentSafetyValidator
Memory Guide
Content validation before saving to memory