Skip to main content

Security Guide

This guide explains how to use the SDK's Security module to implement content moderation, prompt shielding, and PII (Personally Identifiable Information) detection for agent inputs and outputs.

Overview

The Security module provides content safety check functionality supporting 3 cloud providers.

ProviderContent ModerationPrompt ShieldPII Detection
AzureAzure AI Content SafetyPrompt Shield APIPII Detection API
AWSBedrock GuardrailsBedrock GuardrailsAmazon Comprehend
GCPModel ArmorModel ArmorCloud DLP
User Input
|
v
+------------------------+
| SecurityClientBase | <- Execute all at once with check_security()
+--------+---------------+
|
+----+----+
v v
Content PII
Moderation Detection
| |
+----+----+
|
allowed / blocked decision
|
v
Agent processing or block

Provider Configuration

config.tomlToml
[security]
provider = "azure"
content_safety_endpoint = "https://your-content-safety.cognitiveservices.azure.com/"
content_safety_api_key = "${AZURE_CONTENT_SAFETY_KEY}"
language_endpoint = "https://your-language.cognitiveservices.azure.com/"
language_api_key = "${AZURE_LANGUAGE_KEY}"
moderation_threshold = 2
pii_confidence_threshold = 0.7
Python
from agenticstar_platform.security import AzureSecurityClient, AzureSecurityConfig

config = AzureSecurityConfig(
content_safety_endpoint="https://your-content-safety.cognitiveservices.azure.com/",
content_safety_api_key="your-api-key",
language_endpoint="https://your-language.cognitiveservices.azure.com/",
language_api_key="your-language-key",
)
client = AzureSecurityClient(config)

Auto-Detection Factory

You can automatically detect the provider from a configuration dictionary and generate a client.

Python
from agenticstar_platform.security import create_security_client

# Auto-detect provider from config dictionary
client = create_security_client(config_dict)

# Generate PII detection-only client
from agenticstar_platform.security import create_pii_client
pii_client = create_pii_client(config_dict)

Content Moderation

Detect harmful content in user inputs and agent outputs.

Python
result = await client.check_content_moderation(
text="Text to check",
threshold=2, # severity 0-6, blocks at or above this threshold
)

print(result.blocked) # True (blocked) / False (allowed)
print(result.categories) # {ContentCategory.HATE: 0, ContentCategory.VIOLENCE: 4, ...}
print(result.threshold) # Applied threshold

Detection Categories and Severity

ContentCategoryDescriptionSeverity (0-6)
HATEHate speech / discrimination0=safe, 2=low, 4=medium, 6=high
SEXUALSexual contentSame as above
SELF_HARMSelf-harmSame as above
VIOLENCEViolent contentSame as above
PROFANITYInappropriate languageSame as above
INSULTInsulting languageSame as above
THREATThreatening languageSame as above

Prompt Shield

Detect prompt injection attacks.

Python
result = await client.check_prompt_shield(
user_prompt="User input text",
documents=["Reference document 1", "Reference document 2"],
)

print(result.attack_detected) # True / False
print(result.attack_type) # Attack type (when detected)
print(result.confidence) # Confidence score

check_prompt_shield() automatically trims each document to 10,000 characters (to comply with Azure API limits).

PII Detection

Detect personally identifiable information in text. Supports Japanese PII (phone numbers, email addresses, etc.) using regular expressions.

Python
result = await client.detect_pii(
text="Taro Tanaka's phone number is 090-1234-5678. Email is tanaka@example.com.",
mask=True,
language="ja",
)

print(result.success) # True
print(result.masked_text) # "Taro Tanaka's phone number is ***. Email is ***."
print(result.categories_detected) # [PIICategory.PHONE_NUMBER, PIICategory.EMAIL]
for entity in result.entities:
print(f" {entity.category}: {entity.text} (offset={entity.offset}, len={entity.length}, confidence={entity.confidence})")

PII Categories (Major)

PIICategoryDetection Examples
PERSON_NAMETaro Tanaka
PHONE_NUMBER090-1234-5678, 03-1234-5678
EMAILuser@example.com
CREDIT_CARD4111-1111-1111-1111
NATIONAL_IDMy Number (12 digits)
PASSPORT_NUMBERTK1234567
ADDRESSShibuya-ku, Tokyo...
IP_ADDRESS192.168.1.1
API_KEYsk-...
CONNECTION_STRINGDefaultEndpointsProtocol=...

Integrated Security Check

The check_security() method allows you to run content moderation, prompt shield, and PII detection all at once.

Python
result = await client.check_security(
text="Text to check",
check_moderation=True,
check_prompt_shield=True,
check_pii=False, # Default is False
fail_on_error=True,
)

print(result.allowed) # True (allowed) / False (blocked)
print(result.violations) # ["content_moderation: violence detected", ...]
print(result.content_moderation) # ContentModerationResult or None
print(result.prompt_shield) # PromptShieldResult or None
print(result.pii_detection) # PIIDetectionResult or None

ContentSafetyValidator

ContentSafetyValidator is a wrapper class that validates content safety in bulk based on guardrail settings.

Python
from agenticstar_platform.security import ContentSafetyValidator, GuardrailsConfig

validator = ContentSafetyValidator(
security_client=client,
guardrails=guardrails_config, # Guardrail settings retrieved from DB
)

# Validate content
result = await validator.validate(user_message)
print(result.allowed)
print(result.violations)

Usage Patterns in Agents

Input and Output Checking

Python
async def safe_agent_run(user_message: str) -> str:
# 1. Input check
input_result = await client.check_security(text=user_message)
if not input_result.allowed:
return f"Request blocked: {', '.join(input_result.violations)}"

# 2. Agent processing
response = await run_agent_logic(user_message)

# 3. Output check
output_result = await client.check_security(text=response)
if not output_result.allowed:
return "A content policy violation was detected during response generation."

return response

Next Steps

SDK API Reference — Security Module

Complete specifications for SecurityClient / ContentSafetyValidator

ガイドを見る

Memory Guide

Content validation before saving to memory

ガイドを見る