Production Deployment

Name: AgentiCraft
Author: AgentiCraft

Checklists, monitoring, scaling strategies, and operational best practices for production AgentiCraft deployments.

1 min read

Pre-Deployment Checklist

Before deploying to production, verify:

Provider Configuration

API keys stored in a secrets manager (not environment variables or code)
Key pool configured with multiple keys per provider for redundancy
Circuit breaker enabled on all providers with appropriate thresholds
Rate limits set below provider quotas to avoid hard rejections
Fallback providers configured for critical paths

Resilience

Circuit breaker thresholds tuned to your traffic patterns
Retry policies configured with exponential backoff and jitter
Timeout values set on all LLM calls (recommended: 30s for completions, 60s for long-form)
Fallback routing tested — verify traffic shifts when a provider goes down

Monitoring

Request latency tracked per provider
Error rates tracked per provider and error type
Token usage tracked for cost management
Circuit breaker state changes logged
Key rotation events logged

Multi-Provider Resilience

Circuit Breaker Configuration

from agenticraft_llm import CircuitBreaker
 
breaker = CircuitBreaker(
    failure_threshold=5,     # Open after 5 failures
    recovery_timeout=30.0,   # Try recovery after 30s
    half_open_max=2,         # Allow 2 test requests in half-open
)

Fallback Chain

Configure providers in priority order. Traffic automatically shifts on failure:

from agenticraft_llm import CostAwareRouter, LLMProviderConfig
 
router = CostAwareRouter(
    providers=[
        LLMProviderConfig(provider="openai", model="gpt-5.4"),
        LLMProviderConfig(provider="anthropic", model="claude-sonnet-4-6"),
        LLMProviderConfig(provider="google", model="gemini-3.1-pro-preview"),
    ],
    circuit_breaker=breaker,
)

Cost Management

Token Budget Tracking

Monitor token usage per provider to control costs:

# Check usage after each call
response = await router.complete(messages=messages)
print(f"Tokens: {response.usage.total_tokens}")
print(f"Estimated cost: ${response.usage.total_tokens * 0.00001:.4f}")

Cost-Aware Routing

The router uses Thompson sampling to balance cost vs. quality. Configure cost weights:

router = CostAwareRouter(
    providers=[...],
    cost_weight=0.7,      # Prioritize cost savings
    latency_weight=0.2,   # Some latency sensitivity
    quality_weight=0.1,   # Minimal quality differentiation
)

Scaling Considerations

Key Pool Sizing

Rule of thumb: configure at least 3 keys per provider for production workloads. This provides:

Redundancy if a key is revoked or rate-limited
Better rate limit distribution across keys
Zero-downtime key rotation

Connection Pooling

agenticraft-llm uses httpx with connection pooling by default. For high-throughput scenarios:

config = LLMProviderConfig(
    provider="openai",
    model="gpt-5-mini",
    max_connections=20,    # Default: 10
    timeout=30.0,          # Default: 60.0
)

Next Steps

Security Best Practices — secure your deployment
LLM API Reference — full provider configuration options
Architecture Overview — system design