Security Guide
This guide explains the security configuration of the AGENTIC STAR marketplace edition and the steps to harden it for production environments.
Initial Configuration Policy
The marketplace edition defaults to a minimal-cost configuration. All components are configured with a single replica (replica=1), and you can scale up incrementally after purchase based on your security and availability requirements.
Zero-Trust Communication Architecture
AGENTIC STAR employs a communication architecture based on the zero-trust security model.
Pods in the marketplace-agent namespace have Envoy sidecar proxies automatically injected. Outbound traffic passes through Envoy and is subject to authorization control by the ExtAuth Service (external authorization service).
| Component | Role |
|---|---|
| Envoy Injection Webhook | A MutatingWebhook that automatically injects an Envoy sidecar container when a Pod is created |
| Envoy Sidecar | Intercepts outbound traffic within each Pod and sends authorization requests to the ExtAuth Service |
| ExtAuth Service | An external authorization service that permits or denies traffic based on authorization policies |
Security Configuration Levels
AGENTIC STAR defines 3 configuration levels based on security and availability requirements.
| Level | Name | Replicas | FailurePolicy | AntiAffinity | PDB | Zero-Trust Guarantee |
|---|---|---|---|---|---|---|
| Level 1 | Initial (Default) | 1 | Ignore | None | None | Best-effort |
| Level 2 | Recommended | 2 | Fail | Yes | Yes | Guaranteed |
| Level 3 | Production | 3+ | Fail | Yes | Yes | Guaranteed + High Availability |
Differences in Zero-Trust Guarantee Levels
- Best-effort — Envoy-mediated traffic is guaranteed under normal conditions, but Pods without sidecars may start if the Webhook is down
- Guaranteed — Pod creation is blocked when the Webhook is down, preventing Pods without sidecars from starting
Recommended Use Cases for Each Level
| Level | Recommended Use | Description |
|---|---|---|
| Level 1 | Development, PoC, functional validation | Cost-optimized. For environments without strict security requirements |
| Level 2 | Minimum production configuration | The minimum configuration that guarantees zero-trust communication. Recommended when starting production use |
| Level 3 | Mission-critical production environments | Zero-trust guaranteed with high availability. For workloads where service continuity is critical |
Initial Configuration (Level 1) Constraints
The initial configuration prioritizes cost optimization and has the following constraints. Upgrading to Level 2 or higher is strongly recommended for production use.
Single-Replica Risks
| Component | Initial Configuration | Risk on Failure |
|---|---|---|
| ExtAuth Service | replica=1, PDB disabled | Zero-trust authorization stops |
| Envoy Injection Webhook | replica=1, PDB disabled | Pods start without sidecar injection |
| NGINX Ingress Controller | replica=1 | All external access stops |
| Keycloak (Authentication) | replica=1 | User authentication unavailable |
| Gate Services (Routing) | replica=1 | API routing stops |
Envoy Injection Webhook Constraints
In the initial configuration, failurePolicy: Ignore is set. If the Webhook Pod goes down due to failure or node maintenance, Pods that start without a sidecar can bypass authorization control and communicate directly with external services. This state persists until the Pod is restarted.
NetworkPolicy Not Applied
NetworkPolicy is not applied in the initial configuration. Any Pod in the cluster can communicate with any service.
Recommended Security Values by Component
Security Infrastructure
ExtAuth Service
| Setting | Level 1 | Level 2 | Level 3 |
|---|---|---|---|
replicaCount | 1 | 2 | 3 |
autoscaling.minReplicas | 1 | 2 | 3 |
pdb.enabled | false | true | true |
pdb.minAvailable | — | 1 | 1 |
Envoy Injection Webhook
| Setting | Level 1 | Level 2 | Level 3 |
|---|---|---|---|
webhook.replicaCount | 1 | 2 | 2 |
webhook.pdb.enabled | false | true | true |
webhook.pdb.minAvailable | — | 1 | 1 |
webhook.mutating.failurePolicy | Ignore | Fail | Fail |
When changing failurePolicy to Fail, ensure the Webhook replica count is 2 or higher. Setting Fail with a single replica will block all Pod creation when the Webhook is down.
Authentication and Routing
Keycloak
| Setting | Level 1 | Level 2 | Level 3 |
|---|---|---|---|
replicas | 1 | 2 | 3 |
| JDBC-ping clustering | Not needed | Enabled (automatic) | Enabled (automatic) |
Keycloak automatically enables JDBC-ping clustering when replicas is 2 or more. Session sharing ensures authentication sessions continue even if one instance fails.
NGINX Ingress Controller
| Setting | Level 1 | Level 2 | Level 3 |
|---|---|---|---|
autoscaling.minReplicas | 1 | 2 | 3 |
autoscaling.maxReplicas | 5 | 5 | 10 |
autoscaling.targetCPU | 70% | 70% | 70% |
autoscaling.targetMemory | 80% | 80% | 80% |
Gate Services
| Setting | Level 1 | Level 2 | Level 3 |
|---|---|---|---|
autoscaling.minReplicas | 1 | 2 | 2 |
Application
Agent Executor / AgenticAI Admin / AgenticAI ExtAPI
| Setting | Level 1 | Level 2 | Level 3 |
|---|---|---|---|
autoscaling.minReplicas | 1 | 2 | 2 |
Data Stores
Qdrant (Vector DB)
| Setting | Level 1 | Level 2 | Level 3 |
|---|---|---|---|
replicaCount | 1 | 1 | 3 (cluster: true) |
cluster | false | false | true |
loadBalancer.enabled | false | false | false |
dashboard.enabled | false | false | false |
loadBalancer.enabled and dashboard.enabled must be set to false in production environments. Enabling them allows external access to Qdrant, creating a risk of vector data leakage.
MongoDB
| Setting | Level 1 | Level 2 | Level 3 |
|---|---|---|---|
podAntiAffinityPreset | soft | soft | hard |
Monitoring Infrastructure
Prometheus
| Setting | Level 1 | Security Recommendation |
|---|---|---|
securityContext.runAsNonRoot | true | Keep (no change needed) |
securityContext.runAsUser | 1000 | Keep (no change needed) |
securityContext.fsGroup | 2000 | Keep (no change needed) |
retention | 30d | Extend based on audit requirements |
storage | 50Gi | Increase based on data volume |
Loki
| Setting | Level 1 | Security Recommendation |
|---|---|---|
auth_enabled | false | Consider true for production |
retention_period | 7d | Extend based on audit requirements (30d or more) |
storage | 10Gi | Increase based on log volume |
Network Isolation with NetworkPolicy
NetworkPolicy is not included in the initial configuration. The following NetworkPolicies are recommended based on your cluster's security requirements.
NetworkPolicy requires a compatible CNI plugin (Calico / Cilium, etc.). The AGENTIC STAR AKS cluster has Calico enabled, so NetworkPolicies can be applied directly.
Qdrant Access Restriction
Qdrant stores vectorized data. Restrict access sources to prevent unauthorized data leakage.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: qdrant-allow-authorized-only
namespace: qdrant
spec:
podSelector:
matchLabels:
app: qdrant
policyTypes:
- Ingress
ingress:
# Intra-namespace traffic (cluster P2P)
- from:
- podSelector: {}
# Access from authorized external namespaces
- from:
- namespaceSelector:
matchLabels:
name: agent-executor
ports:
- port: 6333
protocol: TCP
- port: 6334
protocol: TCP
You can also apply an equivalent NetworkPolicy via Helm by configuring networkPolicy.enabled: true and networkPolicy.allowedNamespaces in the Qdrant Helm chart.
Database Access Restriction
Restrict PostgreSQL access to only the components that require it.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: restrict-postgresql-access
namespace: autonomous-agent
spec:
podSelector:
matchLabels:
app: postgresql
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: agent-executor
- namespaceSelector:
matchLabels:
name: agenticai-admin
- namespaceSelector:
matchLabels:
name: gate-services
ports:
- port: 5432
protocol: TCP
ExtAuth Service Access Restriction
Since the ExtAuth Service is central to authorization control, restrict access to Envoy sidecars only.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: extauth-service-allow-envoy-only
namespace: proxy-system
spec:
podSelector:
matchLabels:
app: extauth-service
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector: {}
podSelector:
matchLabels:
sidecar: envoy
Monitoring Infrastructure Access Restriction
Restrict access to monitoring infrastructure such as Prometheus and Loki to prevent leakage of metrics and log data.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: prometheus-allow-internal-only
namespace: observability
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: prometheus
policyTypes:
- Ingress
ingress:
# Access from Grafana / Alertmanager
- from:
- podSelector: {}
# Remote Write from Alloy
- from:
- namespaceSelector: {}
podSelector:
matchLabels:
app.kubernetes.io/name: alloy
ports:
- port: 9090
protocol: TCP
TLS / SSL Configuration
Self-signed certificates are used in the initial configuration. For production environments, custom domains and proper SSL certificates are required.
For domain acquisition, DNS configuration, and SSL certificate setup instructions, see the Domain Settings Guide.
Upgrade Steps
Level 1 to Level 2 (Recommended Configuration)
Step 1: Create an ExtAuth Service values override file
# security-upgrade-values.yaml
# Level 2: Recommended configuration (zero-trust guaranteed)
replicaCount: 2
autoscaling:
minReplicas: 2
pdb:
enabled: true
minAvailable: 1
webhook:
replicaCount: 2
pdb:
enabled: true
minAvailable: 1
mutating:
failurePolicy: Fail
AntiAffinity is automatically applied using the Helm chart default value (preferred), so no explicit override is needed.
Step 2: Increase replica counts for each component
Set minReplicas to 2 or higher in the values file for each of the following components.
autoscaling:
minReplicas: 2
Target components:
- agent-executor
- agenticai-admin
- agenticai-extapi
- gate-services
- ingress-nginx
- librechat-auth (
replicas: 2)
Step 3: Run Helm upgrade
helm upgrade extauth-service ./extauth-service \
-n proxy-system \
-f current-values.yaml \
-f security-upgrade-values.yaml
Step 4: Verify the changes
# Verify Pod distribution
kubectl get pods -n proxy-system -o wide
# Verify PDB creation
kubectl get pdb -n proxy-system
# Verify FailurePolicy
kubectl get mutatingwebhookconfiguration envoy-injector-webhook \
-o jsonpath='{.webhooks[0].failurePolicy}'
# Expected: Fail
# Verify Envoy injection
kubectl rollout restart deployment/agent-executor -n agent-executor
kubectl get pods -n agent-executor
# Expected: All Pods show READY 2/2
Level 2 to Level 3 (Production Configuration)
Step 1: Create a values override file
# production-values.yaml
# Level 3: Production configuration (high availability)
replicaCount: 3
autoscaling:
minReplicas: 3
webhook:
replicaCount: 2
Step 2: Run Helm upgrade
helm upgrade extauth-service ./extauth-service \
-n proxy-system \
-f current-values.yaml \
-f security-upgrade-values.yaml \
-f production-values.yaml
Step 3: Verify the changes
# Verify 3 extauth-service Pods are running
kubectl get pods -n proxy-system -l app=extauth-service
# Verify 2 webhook Pods are running
kubectl get pods -n proxy-system -l app=envoy-injector-webhook
Production Security Checklist
Zero-Trust Communication
- ExtAuth Service replica count is 2 or higher
- Envoy Injection Webhook replica count is 2 or higher
failurePolicyis set toFail- PDB is enabled
- Envoy sidecar is injected in all workload Pods (READY 2/2)
Network Isolation
- Qdrant access is restricted to only the required namespaces
- PostgreSQL access is restricted to only the required components
- ExtAuth Service access is restricted to Envoy sidecars only
TLS / Certificates
- Self-signed certificates have been replaced with CA-issued certificates
- SSL redirect is enabled on all Ingress resources (enabled by default)
- Certificate auto-renewal is configured
Data Stores
- Qdrant
loadBalancer.enabledisfalse - Qdrant
dashboard.enabledisfalse - PostgreSQL connection credentials are managed via Kubernetes Secrets
Availability
- All component replica counts are 2 or higher
- PDB is enabled
- AntiAffinity ensures Pods are distributed across different nodes
Monitoring and Logging
- Prometheus metrics retention period meets audit requirements
- Loki log retention period meets audit requirements
- Alert notifications are properly configured
Glossary
| Term | Description |
|---|---|
| FailurePolicy: Ignore | A setting that allows Pod creation to continue even when the Webhook is down. There is a risk of Pods starting without sidecar injection |
| FailurePolicy: Fail | A setting that rejects Pod creation when the Webhook is down. Pods without sidecars will not start, but new Pod creation is blocked during Webhook outages |
| PodDisruptionBudget (PDB) | A mechanism that guarantees a minimum number of Pods during node maintenance or upgrades |
| Pod AntiAffinity | A setting to distribute Pods of the same workload across different nodes |
| NetworkPolicy | A standard Kubernetes network access control feature. Restricts Pod-to-Pod communication at the Ingress / Egress level. Requires CNI plugin support |