Security Guide

This guide explains how to use the SDK's Security module to implement content moderation, prompt shielding, and PII (Personally Identifiable Information) detection for agent inputs and outputs.

Overview

The Security module provides content safety check functionality supporting 3 cloud providers.

Provider	Content Moderation	Prompt Shield	PII Detection
Azure	Azure AI Content Safety	Prompt Shield API	PII Detection API
AWS	Bedrock Guardrails	Bedrock Guardrails	Bedrock Guardrails (sensitiveInformationPolicy)
GCP	Model Armor	Model Armor	Cloud DLP

info

About PII Detection on AWS: PII detection uses the Bedrock Guardrails sensitiveInformationPolicy and supports multiple languages (including Japanese). Content Safety and PII detection share the same Guardrail and are evaluated together in a single apply_guardrail API call.

User Input
     |
     v
+------------------------+
| SecurityClientBase     |  <- Execute all at once with check_security()
+--------+---------------+
         |
    +----+----+
    v         v
Content    PII
Moderation Detection
    |         |
    +----+----+
         |
  allowed / blocked decision
         |
         v
  Agent processing or block

Provider Configuration

Azure
AWS
GCP

config.tomlToml
[security]
provider = "azure"
content_safety_endpoint = "https://your-content-safety.cognitiveservices.azure.com/"
content_safety_api_key = "${AZURE_CONTENT_SAFETY_KEY}"
language_endpoint = "https://your-language.cognitiveservices.azure.com/"
language_api_key = "${AZURE_LANGUAGE_KEY}"
moderation_threshold = 2
pii_confidence_threshold = 0.7

Python
from agenticstar_platform.security import AzureSecurityClient, AzureSecurityConfig

config = AzureSecurityConfig(
    content_safety_endpoint="https://your-content-safety.cognitiveservices.azure.com/",
    content_safety_api_key="your-api-key",
    language_endpoint="https://your-language.cognitiveservices.azure.com/",
    language_api_key="your-language-key",
)
client = AzureSecurityClient(config)

config.tomlToml
[security]
provider = "aws"
region_name = "us-east-1"
guardrail_id = "your-guardrail-id"
guardrail_version = "DRAFT"
# PII detection backend (default: "bedrock_guardrails")
#   "bedrock_guardrails": Bedrock Guardrails sensitiveInformationPolicy
#     (multi-language support including Japanese; shares guardrail_id/version)
#   "comprehend": Amazon Comprehend DetectPiiEntities (en/es only; legacy)
# pii_service = "bedrock_guardrails"

Python
from agenticstar_platform.security import AWSSecurityClient, AWSSecurityConfig

config = AWSSecurityConfig(
    region_name="us-east-1",
    guardrail_id="your-guardrail-id",
    # pii_service defaults to "bedrock_guardrails" (multi-language PII)
)
client = AWSSecurityClient(config)

info

The default value of pii_service is "bedrock_guardrails". Make sure to enable sensitiveInformationPolicy on the Bedrock Guardrails side (register NAME / EMAIL / PHONE / ADDRESS etc. in piiEntitiesConfig with action: ANONYMIZE). You can switch to Amazon Comprehend DetectPiiEntities by explicitly setting "comprehend", but this is generally not recommended since Comprehend only supports en/es.

config.tomlToml
[security]
provider = "gcp"
project_id = "your-project"
model_armor_template = "your-armor-template"
model_armor_region = "us-central1"
dlp_location = "global"

Python
from agenticstar_platform.security import GCPSecurityClient, GCPSecurityConfig

config = GCPSecurityConfig(
    project_id="your-project",
    model_armor_template="your-armor-template",
    model_armor_region="us-central1",
)
client = GCPSecurityClient(config)

Auto-Detection Factory

You can automatically detect the provider from a configuration dictionary and generate a client.

Python
from agenticstar_platform.security import create_security_client

# Auto-detect provider from config dictionary
client = create_security_client(config_dict)

# Generate PII detection-only client
from agenticstar_platform.security import create_pii_client
pii_client = create_pii_client(config_dict)

Content Moderation

Detect harmful content in user inputs and agent outputs.

Python
result = await client.check_content_moderation(
    text="Text to check",
    threshold=2,  # severity 0-6, blocks at or above this threshold
)

print(result.blocked)      # True (blocked) / False (allowed)
print(result.categories)   # {ContentCategory.HATE: 0, ContentCategory.VIOLENCE: 4, ...}
print(result.threshold)    # Applied threshold

Detection Categories and Severity

ContentCategory	Description	Severity (0-6)
`HATE`	Hate speech / discrimination	0=safe, 2=low, 4=medium, 6=high
`SEXUAL`	Sexual content	Same as above
`SELF_HARM`	Self-harm	Same as above
`VIOLENCE`	Violent content	Same as above
`PROFANITY`	Inappropriate language	Same as above
`INSULT`	Insulting language	Same as above
`THREAT`	Threatening language	Same as above

Prompt Shield

Detect prompt injection attacks.

Python
result = await client.check_prompt_shield(
    user_prompt="User input text",
    documents=["Reference document 1", "Reference document 2"],
)

print(result.attack_detected)  # True / False
print(result.attack_type)      # Attack type (when detected)
print(result.confidence)       # Confidence score

check_prompt_shield() automatically trims each document to 10,000 characters (to comply with Azure API limits).

PII Detection

Detect personally identifiable information in text.

Azure: PII Detection API (multi-language support)
AWS: Bedrock Guardrails sensitiveInformationPolicy by default (multi-language support including Japanese). Setting pii_service="comprehend" explicitly switches to Amazon Comprehend DetectPiiEntities (en/es only)
GCP: Cloud DLP (multi-language support)

Python
result = await client.detect_pii(
    text="Taro Tanaka's phone number is 090-1234-5678. Email is tanaka@example.com.",
    mask=True,
    language="ja",
)

print(result.success)              # True
print(result.masked_text)          # "Taro Tanaka's phone number is ***. Email is ***."
print(result.categories_detected)  # [PIICategory.PHONE_NUMBER, PIICategory.EMAIL]
for entity in result.entities:
    print(f"  {entity.category}: {entity.text} (offset={entity.offset}, len={entity.length}, confidence={entity.confidence})")

PII Categories (Major)

PIICategory	Detection Examples
`PERSON_NAME`	Taro Tanaka
`PHONE_NUMBER`	090-1234-5678, 03-1234-5678
`EMAIL`	user@example.com
`CREDIT_CARD`	4111-1111-1111-1111
`NATIONAL_ID`	My Number (12 digits)
`PASSPORT_NUMBER`	TK1234567
`ADDRESS`	Shibuya-ku, Tokyo...
`IP_ADDRESS`	192.168.1.1
`API_KEY`	sk-...
`CONNECTION_STRING`	DefaultEndpointsProtocol=...

Integrated Security Check

The check_security() method allows you to run content moderation, prompt shield, and PII detection all at once.

Python
result = await client.check_security(
    text="Text to check",
    check_moderation=True,
    check_prompt_shield=True,
    check_pii=False,           # Default is False
    fail_on_error=True,
)

print(result.allowed)              # True (allowed) / False (blocked)
print(result.violations)           # ["harmful_content", "jailbreak_attack", "pii_detected", ...]
print(result.content_moderation)   # ContentModerationResult or None
print(result.prompt_shield)        # PromptShieldResult or None
print(result.pii_detection)        # PIIDetectionResult or None

ContentSafetyValidator

ContentSafetyValidator is a wrapper class that validates and marks external content (fetched from web / browser / API / MCP) based on guardrail settings.

Python
from agenticstar_platform.security import ContentSafetyValidator, GuardrailsConfig

validator = ContentSafetyValidator(
    security_client=client,
    guardrails=guardrails_config,  # GuardrailsConfig retrieved from DB (defaults if omitted)
)

# Validate and sanitize external content (source_type is required)
result = await validator.validate_and_sanitize(
    content=external_content,
    source_type="web",                  # "web" / "browser" / "api" / "mcp"
    source_url="https://example.com",   # optional
    user_prompt=user_message,           # for Prompt Shield validation (optional)
)

if result.is_safe:
    print(result.marked_content)  # marked content (empty string when unsafe)
print(result.metadata)            # validation metadata

Usage Patterns in Agents

Input and Output Checking

Python
async def safe_agent_run(user_message: str) -> str:
    # 1. Input check
    input_result = await client.check_security(text=user_message)
    if not input_result.allowed:
        return f"Request blocked: {', '.join(input_result.violations)}"

    # 2. Agent processing
    response = await run_agent_logic(user_message)

    # 3. Output check
    output_result = await client.check_security(text=response)
    if not output_result.allowed:
        return "A content policy violation was detected during response generation."

    return response

Next Steps

SDK API Reference — Security Module

Complete specifications for SecurityClient / ContentSafetyValidator

ガイドを見る

Memory Guide

Content validation before saving to memory

ガイドを見る

Overview​

Provider Configuration​

Auto-Detection Factory​

Content Moderation​

Detection Categories and Severity​

Prompt Shield​

PII Detection​

PII Categories (Major)​

Integrated Security Check​

ContentSafetyValidator​

Usage Patterns in Agents​

Input and Output Checking​

Next Steps​