セキュリティガイド

SDK の Security モジュールを使って、エージェントの入出力に対するコンテンツモデレーション、プロンプトシールド、PII（個人情報）検出を実装する方法を解説します。

概要

Security モジュールは 3 つのクラウドプロバイダーに対応したコンテンツ安全性チェック機能を提供します。

プロバイダー	コンテンツモデレーション	プロンプトシールド	PII 検出
Azure	Azure AI Content Safety	Prompt Shield API	PII Detection API
AWS	Bedrock Guardrails	Bedrock Guardrails	Bedrock Guardrails (sensitiveInformationPolicy)
GCP	Model Armor	Model Armor	Cloud DLP

備考

AWS 環境の PII 検出について: Bedrock Guardrails の sensitiveInformationPolicy を使用しており、日本語含む多言語に対応しています。ContentSafety と PII 検出で同一の Guardrail を共有し、apply_guardrail API 1経路で両方を評価する構成です。

ユーザー入力
     │
     ▼
┌────────────────────────┐
│ SecurityClientBase     │  ← check_security() で一括実行
└────────┬───────────────┘
         │
    ┌────┴────┐
    ▼         ▼
コンテンツ  PII
モデレーション 検出
    │         │
    └────┬────┘
         │
  allowed / blocked 判定
         │
         ▼
  エージェント処理 or ブロック

プロバイダーの設定

Azure
AWS
GCP

config.tomlToml
[security]
provider = "azure"
content_safety_endpoint = "https://your-content-safety.cognitiveservices.azure.com/"
content_safety_api_key = "${AZURE_CONTENT_SAFETY_KEY}"
language_endpoint = "https://your-language.cognitiveservices.azure.com/"
language_api_key = "${AZURE_LANGUAGE_KEY}"
moderation_threshold = 2
pii_confidence_threshold = 0.7

Python
from agenticstar_platform.security import AzureSecurityClient, AzureSecurityConfig

config = AzureSecurityConfig(
    content_safety_endpoint="https://your-content-safety.cognitiveservices.azure.com/",
    content_safety_api_key="your-api-key",
    language_endpoint="https://your-language.cognitiveservices.azure.com/",
    language_api_key="your-language-key",
)
client = AzureSecurityClient(config)

config.tomlToml
[security]
provider = "aws"
region_name = "us-east-1"
guardrail_id = "your-guardrail-id"
guardrail_version = "DRAFT"
# PII 検出経路の選択 (省略時のデフォルトは "bedrock_guardrails")
#   "bedrock_guardrails": Bedrock Guardrails sensitiveInformationPolicy
#     (多言語対応。日本語含む。guardrail_id / guardrail_version を共有)
#   "comprehend": Amazon Comprehend DetectPiiEntities (en/es のみ対応・レガシー)
# pii_service = "bedrock_guardrails"

Python
from agenticstar_platform.security import AWSSecurityClient, AWSSecurityConfig

config = AWSSecurityConfig(
    region_name="us-east-1",
    guardrail_id="your-guardrail-id",
    # pii_service のデフォルトは "bedrock_guardrails" (多言語対応 PII)
)
client = AWSSecurityClient(config)

備考

pii_service のデフォルトは "bedrock_guardrails" です。Bedrock Guardrails 側で sensitiveInformationPolicy を有効化してください (piiEntitiesConfig に NAME / EMAIL / PHONE / ADDRESS 等を action: ANONYMIZE で登録)。"comprehend" を明示指定すると Amazon Comprehend DetectPiiEntities に切替できますが、対応言語が en/es のみのため通常は推奨しません。

config.tomlToml
[security]
provider = "gcp"
project_id = "your-project"
model_armor_template = "your-armor-template"
model_armor_region = "us-central1"
dlp_location = "global"

Python
from agenticstar_platform.security import GCPSecurityClient, GCPSecurityConfig

config = GCPSecurityConfig(
    project_id="your-project",
    model_armor_template="your-armor-template",
    model_armor_region="us-central1",
)
client = GCPSecurityClient(config)

自動検出ファクトリ

設定辞書からプロバイダーを自動検出してクライアントを生成できます。

Python
from agenticstar_platform.security import create_security_client

# 設定辞書からプロバイダーを自動検出
client = create_security_client(config_dict)

# PII 検出専用クライアントを生成
from agenticstar_platform.security import create_pii_client
pii_client = create_pii_client(config_dict)

コンテンツモデレーション

ユーザー入力やエージェント出力に対して、有害コンテンツの検出を行います。

Python
result = await client.check_content_moderation(
    text="チェック対象のテキスト",
    threshold=2,  # severity 0-6、このしきい値以上でブロック
)

print(result.blocked)      # True（ブロック）/ False（許可）
print(result.categories)   # {ContentCategory.HATE: 0, ContentCategory.VIOLENCE: 4, ...}
print(result.threshold)    # 適用されたしきい値

検出カテゴリと重大度

ContentCategory	説明	重大度 (0-6)
`HATE`	ヘイトスピーチ・差別	0=安全, 2=低, 4=中, 6=高
`SEXUAL`	性的コンテンツ	同上
`SELF_HARM`	自傷行為	同上
`VIOLENCE`	暴力的コンテンツ	同上
`PROFANITY`	不適切な表現	同上
`INSULT`	侮辱的表現	同上
`THREAT`	脅迫的表現	同上

プロンプトシールド

プロンプトインジェクション攻撃を検出します。

Python
result = await client.check_prompt_shield(
    user_prompt="ユーザーの入力テキスト",
    documents=["参照ドキュメント1", "参照ドキュメント2"],
)

print(result.attack_detected)  # True / False
print(result.attack_type)      # 攻撃タイプ（検出時）
print(result.confidence)       # 信頼度スコア

check_prompt_shield() は各ドキュメントを 10,000 文字に自動トリミングします（Azure API の制限対応）。

PII 検出

テキストから個人情報を検出します。

Azure: PII Detection API (多言語対応)
AWS: デフォルトで Bedrock Guardrails sensitiveInformationPolicy (多言語対応・日本語含む)。pii_service="comprehend" を明示指定した場合のみ Amazon Comprehend (en/es のみ対応) に切替
GCP: Cloud DLP (多言語対応)

Python
result = await client.detect_pii(
    text="田中太郎の電話番号は090-1234-5678です。メールはtanaka@example.comです。",
    mask=True,
    language="ja",
)

print(result.success)              # True
print(result.masked_text)          # "田中太郎の電話番号は***です。メールは***です。"
print(result.categories_detected)  # [PIICategory.PHONE_NUMBER, PIICategory.EMAIL]
for entity in result.entities:
    print(f"  {entity.category}: {entity.text} (offset={entity.offset}, len={entity.length}, confidence={entity.confidence})")

PII カテゴリ（主要なもの）

PIICategory	検出対象の例
`PERSON_NAME`	田中太郎
`PHONE_NUMBER`	090-1234-5678, 03-1234-5678
`EMAIL`	user@example.com
`CREDIT_CARD`	4111-1111-1111-1111
`NATIONAL_ID`	マイナンバー（12桁）
`PASSPORT_NUMBER`	TK1234567
`ADDRESS`	東京都渋谷区...
`IP_ADDRESS`	192.168.1.1
`API_KEY`	sk-...
`CONNECTION_STRING`	DefaultEndpointsProtocol=...

統合セキュリティチェック

check_security() メソッドで、コンテンツモデレーション・プロンプトシールド・PII 検出をまとめて実行できます。

Python
result = await client.check_security(
    text="チェック対象のテキスト",
    check_moderation=True,
    check_prompt_shield=True,
    check_pii=False,           # デフォルトは False
    fail_on_error=True,
)

print(result.allowed)              # True（許可）/ False（ブロック）
print(result.violations)           # ["harmful_content", "jailbreak_attack", "pii_detected", ...]
print(result.content_moderation)   # ContentModerationResult or None
print(result.prompt_shield)        # PromptShieldResult or None
print(result.pii_detection)        # PIIDetectionResult or None

ContentSafetyValidator

ContentSafetyValidator は、ガードレール設定に基づいて外部コンテンツ（Web / ブラウザ / API / MCP から取得した内容）を検証・マーキングするラッパークラスです。

Python
from agenticstar_platform.security import ContentSafetyValidator, GuardrailsConfig

validator = ContentSafetyValidator(
    security_client=client,
    guardrails=guardrails_config,  # DB から取得した GuardrailsConfig（省略時はデフォルト）
)

# 外部コンテンツの検証とサニタイズ（source_type は必須）
result = await validator.validate_and_sanitize(
    content=external_content,
    source_type="web",                  # "web" / "browser" / "api" / "mcp"
    source_url="https://example.com",   # 任意
    user_prompt=user_message,           # Prompt Shield 検証用（任意）
)

if result.is_safe:
    print(result.marked_content)  # マーキング済みコンテンツ（unsafe 時は空文字）
print(result.metadata)            # 検証結果メタデータ

エージェントでの活用パターン

入出力両方のチェック

Python
async def safe_agent_run(user_message: str) -> str:
    # 1. 入力チェック
    input_result = await client.check_security(text=user_message)
    if not input_result.allowed:
        return f"リクエストがブロックされました: {', '.join(input_result.violations)}"

    # 2. エージェント処理
    response = await run_agent_logic(user_message)

    # 3. 出力チェック
    output_result = await client.check_security(text=response)
    if not output_result.allowed:
        return "回答の生成中にコンテンツポリシー違反が検出されました。"

    return response

次のステップ

SDK API リファレンス — Security Module

SecurityClient / ContentSafetyValidator の完全仕様

ガイドを見る

メモリガイド

メモリに保存する前のコンテンツ検証

ガイドを見る

概要​

プロバイダーの設定​

自動検出ファクトリ​

コンテンツモデレーション​

検出カテゴリと重大度​

プロンプトシールド​

PII 検出​

PII カテゴリ（主要なもの）​

統合セキュリティチェック​

ContentSafetyValidator​

エージェントでの活用パターン​

入出力両方のチェック​

次のステップ​

概要