Deployment 🚀¶
Getting a prototype running locally is great; putting it in production requires reliability, observability, and scalability. Flock ships with first-class Temporal integration which gives you all three out of the box.
1. Local vs. Temporal¶
Aspect | Local Mode (default) | Temporal Mode |
---|---|---|
Fault Tolerance | Exceptions bubble to caller | Automatic retries, timeouts, watchdogs |
State Persistence | Memory-only | Durable (DB, visibility store) |
Concurrency | Single Python process | Horizontal scaling across workers |
Visibility | Stdout + logs | Temporal Web UI, Prometheus, Grafana |
Switching modes is as easy as:
2. Production Checklist¶
- Secrets Management – Store LLM API keys in Vault, AWS Parameter Store, etc.
- Observability – Export OpenTelemetry traces to Jaeger or Tempo.
- Autoscaling – Use
temporal worker --concurrency
or K8s HPA. - Caching – Enable evaluator cache (Redis, Memcached, or
litellm
built-in). - Security – Restrict tool access; disable
cloudpickle
deserialization if loading untrusted payloads.
3. Reference Architectures¶
Docker Compose (PoC)¶
- api (FastAPI) + flock code
- Temporal server (Cadence) or temporal-io docker image
- Prometheus + Grafana for metrics
Kubernetes (Prod)¶
- Flock API image
- Temporal Cluster (6-service helm chart)
- Workers scaled by queue
- OpenTelemetry Collector → Tempo → Grafana
4. Next Steps¶
- Learn the details in Temporal Deployment.
- Check the
scripts/
folder for helper shell scripts. - Browse
tests/
for smoke tests that validate a cluster.