Overview

Follow these best practices to build secure, reliable, and maintainable AI agent systems with AgentWarden.

Architecture Patterns

1. Single Responsibility Agents

Create separate agents for different purposes rather than one “super agent”.
# Customer support agent
support_agent = Agent(
    name="customer-support-bot",
    permissions=["stripe.refund", "email.send", "ticket.update"]
)

# Data processing agent
data_agent = Agent(
    name="data-processing-bot",
    permissions=["database.read", "s3.upload", "api.fetch"]
)

# DevOps agent
devops_agent = Agent(
    name="devops-bot",
    permissions=["deploy.staging", "deploy.production"]
)
Benefits:
  • Better security isolation
  • Clearer audit trails
  • Easier to debug
  • Simpler permission management

2. Check-Execute-Log Pattern

Always follow this pattern for agent actions:
def execute_agent_action(agent_id: str, action: str, context: dict, 
                        execute_fn: callable) -> dict:
    """
    Standard pattern for all agent actions
    
    1. Check permission
    2. Execute if allowed
    3. Log the outcome
    """
    # 1. CHECK
    result = guard.check(agent_id, action, context)
    
    if not result.allowed:
        # Log denial
        guard.log(agent_id, action, "denied", context)
        
        if result.requires_approval:
            return {"status": "pending_approval", "approval_id": result.approval_id}
        else:
            return {"status": "denied", "reason": result.reason}
    
    # 2. EXECUTE
    try:
        outcome = execute_fn()
        
        # 3. LOG SUCCESS
        guard.log(agent_id, action, "success", {**context, "result": outcome})
        return {"status": "success", "result": outcome}
        
    except Exception as e:
        # 3. LOG FAILURE
        guard.log(agent_id, action, "failed", {**context, "error": str(e)})
        return {"status": "failed", "error": str(e)}
Never skip any step:
  • ❌ Don’t execute without checking
  • ❌ Don’t check but forget to log
  • ❌ Don’t log only successes

3. Tiered Permissions

Use tiered permissions for different risk levels:
# Setup in AgentWarden dashboard:
# 
# Action: stripe.refund.small
# Max Amount: $50
# Requires Approval: No
#
# Action: stripe.refund.medium  
# Max Amount: $200
# Requires Approval: No
#
# Action: stripe.refund.large
# Max Amount: $1000
# Requires Approval: Yes

def process_refund(amount: float):
    """Automatically route to appropriate permission tier"""
    if amount <= 50:
        action = "stripe.refund.small"
    elif amount <= 200:
        action = "stripe.refund.medium"
    else:
        action = "stripe.refund.large"
    
    result = guard.check(AGENT_ID, action, {"amount": amount})
    
    # Handle based on result...
Benefits:
  • Automatic escalation for high-risk actions
  • Reduce approval bottlenecks for low-risk actions
  • Clear risk boundaries

Permission Management

4. Use Descriptive Action Names

Action names should clearly indicate what they do:
"stripe.refund"
"stripe.subscription.cancel"
"database.users.delete"
"email.marketing.send"
"deploy.production"
"api.external.call"
Naming convention:
service.resource.operation
Examples:
  • stripe.refund - Stripe service, refund operation
  • database.users.delete - Database service, users resource, delete operation
  • api.sendgrid.email.send - API integration, SendGrid, email resource, send operation

5. Environment-Specific Permissions

Use different permissions for different environments:
import os

env = os.getenv("ENVIRONMENT", "production")

# Staging: no approvals needed
if env == "staging":
    action = "deploy.staging"
    
# Production: requires approval
elif env == "production":
    action = "deploy.production"

result = guard.check(AGENT_ID, action, {"version": "v2.1.0"})
Dashboard setup:
Action: deploy.staging
Requires Approval: No

Action: deploy.production
Requires Approval: Yes

6. Regularly Review Permissions

Set up a quarterly permission audit:
# Script to audit permissions
def audit_permissions():
    """Review all agent permissions and flag unused ones"""
    agents = get_all_agents()
    
    for agent in agents:
        permissions = get_agent_permissions(agent.id)
        
        # Check last usage
        for perm in permissions:
            last_used = get_last_log_for_action(agent.id, perm.action)
            
            if not last_used:
                print(f"⚠️ {agent.name}: '{perm.action}' never used")
            elif last_used > 90_days_ago:
                print(f"⚠️ {agent.name}: '{perm.action}' not used in 90 days")

# Run quarterly
audit_permissions()

Context and Logging

7. Provide Rich Context

Always include comprehensive context for permission checks and logs:
result = guard.check(
    agent_id=AGENT_ID,
    action="stripe.refund",
    context={
        "amount": 50.00,
        "customer_id": "cus_ABC123",
        "customer_email": "john@example.com",
        "order_id": "ord_XYZ789",
        "order_date": "2024-02-15",
        "reason": "defective_product",
        "ticket_id": "SUPPORT-12345",
        "agent_version": "v2.1.0"
    }
)
Why it matters:
  • Approvers can make informed decisions
  • Better debugging when issues occur
  • Compliance and audit requirements
  • Analytics and reporting

8. Structured Error Context

When logging failures, include structured error information:
try:
    process_refund(amount)
    
except StripeError as e:
    guard.log(
        agent_id=AGENT_ID,
        action="stripe.refund",
        status="failed",
        context={
            "amount": amount,
            "error_type": type(e).__name__,
            "error_code": getattr(e, 'code', None),
            "error_message": str(e),
            "stripe_request_id": getattr(e, 'request_id', None),
            "retry_count": retry_count,
            "timestamp": datetime.utcnow().isoformat()
        }
    )
    
except Exception as e:
    guard.log(
        agent_id=AGENT_ID,
        action="stripe.refund",
        status="failed",
        context={
            "amount": amount,
            "error_type": type(e).__name__,
            "error_message": str(e),
            "traceback": traceback.format_exc()
        }
    )

9. Log Denials Too

Even if permission was denied, log it:
result = guard.check(agent_id, action, context)

if not result.allowed:
    # Always log denials
    guard.log(
        agent_id=agent_id,
        action=action,
        status="denied",
        context={
            **context,
            "denial_reason": result.reason,
            "requires_approval": result.requires_approval
        }
    )
    
    # Handle the denial
    notify_user(result.reason)
Why log denials:
  • Track attempted unauthorized actions
  • Identify misconfigured permissions
  • Security monitoring
  • Detect potential abuse

Security

10. Never Expose API Keys

Keep API keys secure:
import os
from agentwarden import AgentWarden

# Read from environment
guard = AgentWarden(api_key=os.getenv('AGENTWARDEN_API_KEY'))
Best practices:
  • Store in environment variables
  • Use secrets managers (AWS Secrets Manager, HashiCorp Vault)
  • Never commit to version control
  • Rotate keys regularly (every 90 days)
  • Use different keys per environment

11. Fail-Safe Defaults

When in doubt, deny:
def safe_check(agent_id: str, action: str, context: dict) -> bool:
    """
    Wrapper that fails safely
    """
    try:
        result = guard.check(agent_id, action, context)
        return result.allowed
        
    except Exception as e:
        # If AgentWarden is unreachable, deny by default
        logger.error(f"Permission check failed: {e}")
        return False  # Fail-safe: deny

# Usage
if safe_check(agent_id, "dangerous.action", context):
    execute_dangerous_action()
else:
    # Denied - safe!
    pass
Exception: Only allow fail-open for truly low-risk actions that you’ve explicitly whitelisted.

12. Least Privilege Principle

Grant only the minimum permissions needed:
# ✅ Good - Specific permissions
permissions = [
    "email.send",           # Can send emails
    "database.users.read",  # Can read user data
    "api.sendgrid.call"     # Can call SendGrid API
]

# ❌ Bad - Overly broad permissions
permissions = [
    "email.*",         # Can do anything with email
    "database.*",      # Can do anything with database
    "api.*"            # Can call any API
]
Start restrictive, then relax:
  1. Begin with minimal permissions
  2. Monitor for denied actions
  3. Add permissions as needed
  4. Remove unused permissions

Performance

13. Reuse AgentWarden Instance

Create one instance and reuse it:
# app/services/guard.py
from agentwarden import AgentWarden
import os

# Create once
_guard_instance = None

def get_guard():
    global _guard_instance
    if _guard_instance is None:
        _guard_instance = AgentWarden(
            api_key=os.getenv('AGENTWARDEN_API_KEY')
        )
    return _guard_instance

# Use everywhere
from app.services.guard import get_guard

guard = get_guard()
result = guard.check(agent_id, action, context)

14. Implement Caching (Carefully)

Cache permission checks for identical actions:
from functools import lru_cache
import hashlib
import json

@lru_cache(maxsize=1000)
def cached_check(agent_id: str, action: str, context_hash: str):
    """
    Cache permission checks for 60 seconds
    
    ⚠️ Use with caution - permissions can change!
    """
    context = json.loads(context_hash)
    return guard.check(agent_id, action, context)

def check_with_cache(agent_id: str, action: str, context: dict, ttl: int = 60):
    """
    Check permission with caching
    """
    # Create deterministic hash of context
    context_str = json.dumps(context, sort_keys=True)
    context_hash = hashlib.md5(context_str.encode()).hexdigest()
    
    # Add TTL to cache key
    cache_key = f"{agent_id}:{action}:{context_hash}:{int(time.time() // ttl)}"
    
    return cached_check(agent_id, action, cache_key)
⚠️ Caching caveats:
  • Only cache for short periods (30-60 seconds)
  • Clear cache when permissions change
  • Don’t cache approval-required actions
  • Monitor cache hit rates

15. Batch Operations

For bulk operations, check once and execute many:
# ✅ Good - Check once for batch
result = guard.check(
    agent_id=AGENT_ID,
    action="email.send_bulk",
    context={"recipient_count": len(recipients)}
)

if result.allowed:
    for recipient in recipients:
        send_email(recipient)
    
    # Log once for the batch
    guard.log(
        agent_id=AGENT_ID,
        action="email.send_bulk",
        status="success",
        context={"recipients_sent": len(recipients)}
    )

# ❌ Bad - Check for each recipient
for recipient in recipients:
    result = guard.check(AGENT_ID, "email.send", {"recipient": recipient})
    if result.allowed:
        send_email(recipient)

Testing

16. Test Permission Logic

Write tests for permission scenarios:
import pytest
from unittest.mock import Mock, patch

def test_refund_allowed_small_amount():
    """Test that small refunds are auto-approved"""
    with patch('agentwarden.AgentWarden') as MockGuard:
        mock_guard = MockGuard.return_value
        mock_guard.check.return_value = Mock(
            allowed=True,
            requires_approval=False
        )
        
        result = process_refund(amount=25.00)
        
        assert result['status'] == 'approved'
        mock_guard.check.assert_called_once()

def test_refund_requires_approval_large_amount():
    """Test that large refunds require approval"""
    with patch('agentwarden.AgentWarden') as MockGuard:
        mock_guard = MockGuard.return_value
        mock_guard.check.return_value = Mock(
            allowed=False,
            requires_approval=True,
            approval_id="apr_123"
        )
        
        result = process_refund(amount=500.00)
        
        assert result['status'] == 'pending_approval'
        assert 'approval_id' in result

17. Use Test API Keys

AgentWarden provides test API keys for development:
# .env.development
AGENTWARDEN_API_KEY=test_1234567890abcdef

# .env.production  
AGENTWARDEN_API_KEY=ak_production_key_here
Test keys:
  • Prefix: test_
  • Don’t count against plan limits
  • Isolated test data
  • Can be shared with developers

Monitoring

18. Track Key Metrics

Monitor these metrics:
from prometheus_client import Counter, Histogram

# Permission checks
permission_checks = Counter(
    'agentwarden_checks_total',
    'Total permission checks',
    ['agent_id', 'action', 'result']
)

# Check latency
check_duration = Histogram(
    'agentwarden_check_duration_seconds',
    'Permission check duration'
)

# Approvals
approvals_pending = Gauge(
    'agentwarden_approvals_pending',
    'Number of pending approvals'
)

# Usage
with check_duration.time():
    result = guard.check(agent_id, action, context)

permission_checks.labels(
    agent_id=agent_id,
    action=action,
    result='allowed' if result.allowed else 'denied'
).inc()
Key metrics to track:
  • Permission check latency
  • Denial rate by action
  • Approval response time
  • Error rate
  • Rate limit hits

19. Set Up Alerts

Alert on anomalies:
# Alert on high denial rate
if denial_rate > 0.3:  # More than 30% denials
    send_alert("High denial rate detected", severity="warning")

# Alert on API errors
if error_rate > 0.05:  # More than 5% errors
    send_alert("High error rate from AgentWarden", severity="critical")

# Alert on pending approvals
if pending_approvals > 10:
    send_alert("Approval backlog building up", severity="warning")

Documentation

20. Document Your Agents

Keep documentation for each agent:
"""
Agent: customer-support-bot
Purpose: Automated customer support with refund capabilities
Owner: support-team@company.com
Created: 2024-01-15

Permissions:
- stripe.refund.small (≤ $50) - Auto-approved
- stripe.refund.medium (≤ $200) - Auto-approved  
- stripe.refund.large (≤ $1000) - Requires approval
- email.send - Auto-approved
- zendesk.ticket.update - Auto-approved

Last Audit: 2024-02-15
Notes: 
- Large refunds (>$200) escalate to finance team
- Average approval time: 15 minutes
- Handles ~500 requests/day
"""

Checklist

Use this checklist for every new agent:

Quick Reference

PracticeDoDon’t
Agent DesignOne agent per purposeOne super agent
PermissionsLeast privilege, specific actionsBroad wildcards
SecurityEnvironment variables, fail-safeHardcoded keys, fail-open
ContextRich, detailed contextMinimal context
LoggingLog everything (success, fail, deny)Only log successes
ErrorsHandle gracefully, retry with backoffIgnore errors
TestingTest all permission scenariosNo tests
MonitoringTrack metrics, set alertsNo monitoring

Next Steps