Auto-Tracing with OpenTelemetryΒΆ
Flock includes automatic OpenTelemetry instrumentation for all agent methods, providing detailed observability for debugging and monitoring.
Quick StartΒΆ
Enable auto-tracing by setting the environment variable:
This automatically: - β Wraps all public methods with OTEL spans - β Configures logging to DEBUG level - β Captures trace IDs, correlation IDs, and agent metadata - β Creates parent-child span relationships for call hierarchies
ConfigurationΒΆ
Basic Usage (Console Only)ΒΆ
Export to DuckDBΒΆ
# Export traces to .flock/traces.duckdb
export FLOCK_AUTO_TRACE=true
export FLOCK_TRACE_FILE=true
python your_agent.py
Traces are stored in a DuckDB database, which provides: - β 10-100x faster queries than JSON/SQLite - β Built-in trace viewer UI in the Flock dashboard - β SQL analytics for debugging and monitoring - β Efficient columnar storage
Export to Grafana/Jaeger (OTLP)ΒΆ
# Send traces to OTLP endpoint (Grafana, Jaeger, etc.)
export FLOCK_AUTO_TRACE=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
python your_agent.py
Disable Auto-TracingΒΆ
Filtering: Whitelist and BlacklistΒΆ
Control which operations get traced to reduce overhead and noise. This is especially important to avoid tracing streaming token operations which can cause performance issues.
How Filtering Works: Two-Stage ProcessΒΆ
Stage 1: Wrapping Methods with @traced_and_logged
ΒΆ
Methods must first be wrapped with the tracing decorator. This happens automatically via metaclasses:
AutoTracedMeta
: Wraps all public methods inAgent
andFlock
classesTracedModelMeta
: Wraps all public methods inAgentComponent
subclasses
Classes using these metaclasses: - Agent
(from agent.py) - Flock
(from orchestrator.py) - AgentComponent
and all subclasses: - EngineComponent
(DSPyEngine, ClaudeEngine, etc.) - OutputUtilityComponent
- DashboardEventCollector
- MetricsUtility
, LoggingUtility
- All custom components
Stage 2: Filter Check (Whitelist/Blacklist)ΒΆ
Only wrapped methods reach the filter. Before creating an OTEL span, the decorator checks:
service_name = span_name.split('.')[0] # Extract class name
# Example: "Agent.execute" β service_name = "Agent"
if not should_trace(service_name, span_name):
return func(*args, **kwargs) # Skip span creation entirely
Key Point: Methods that are not wrapped (e.g., plain functions, non-component classes) are never traced, regardless of whitelist/blacklist settings.
Whitelist: FLOCK_TRACE_SERVICESΒΆ
Filters by CLASS NAME (case-insensitive). Only trace methods from specified classes.
Behavior: - If set: Only traces methods from listed classes - If empty/unset: Traces ALL wrapped classes - Case-insensitive: "agent"
, "Agent"
, "AGENT"
all match Agent
class
Example:
Result: - β
Agent.execute
- traced (Agent in whitelist) - β
Flock.publish
- traced (Flock in whitelist) - β
DSPyEngine.evaluate
- traced (DSPyEngine in whitelist) - β OutputUtilityComponent.on_post_evaluate
- NOT traced (not in whitelist) - β DashboardEventCollector.collect_event
- NOT traced (not in whitelist)
Blacklist: FLOCK_TRACE_IGNOREΒΆ
Filters by FULL OPERATION NAME (exact match). Never trace specific methods.
# Never trace these specific methods
export FLOCK_TRACE_IGNORE='["DashboardEventCollector.set_websocket_manager", "Agent.get_identity"]'
Behavior: - Exact match on ClassName.method_name
- Takes priority over whitelist - Use this to exclude noisy or low-value operations
Example:
FLOCK_TRACE_SERVICES='["agent", "dashboardeventcollector"]'
FLOCK_TRACE_IGNORE='["DashboardEventCollector.set_websocket_manager"]'
Result: - β
Agent.execute
- traced (in whitelist, not in blacklist) - β
DashboardEventCollector.collect_event
- traced (in whitelist, not in blacklist) - β DashboardEventCollector.set_websocket_manager
- NOT traced (in blacklist)
Filter PriorityΒΆ
- Blacklist (highest priority) - If in
FLOCK_TRACE_IGNORE
, never trace - Whitelist - If
FLOCK_TRACE_SERVICES
is set, only trace listed services - Default - If no filters set, trace everything that's wrapped
Recommended ConfigurationΒΆ
Add to your .env
file:
# Trace core agent execution, avoid streaming token overhead
FLOCK_TRACE_SERVICES=["flock", "agent", "dspyengine", "outpututilitycomponent"]
# Exclude noisy operations
FLOCK_TRACE_IGNORE=["DashboardEventCollector.set_websocket_manager"]
# Auto-delete traces older than 30 days
FLOCK_TRACE_TTL_DAYS=30
Why these defaults? - flock
- Core orchestration (publish, scheduling) - agent
- Agent lifecycle and execution - dspyengine
- LLM calls and responses - outpututilitycomponent
- Output formatting - Excludes dashboard/streaming operations to avoid performance issues - TTL keeps database size manageable by removing old debugging data
Trace Time-To-Live (TTL)ΒΆ
Control database size by automatically deleting old traces:
How it works: - Cleanup runs on application startup (when DuckDB exporter initializes) - Uses the created_at
timestamp field from the spans table - Deletes all spans older than the specified number of days - Prints a summary: [DuckDB TTL] Deleted 1234 spans older than 30 days
When to use TTL: - β Development environments - Keep database small, remove old debugging sessions - β Production - Retain recent traces for debugging, delete historical data - β CI/CD - Clean up test traces automatically - β Long-term analytics - If you need historical trace data, disable TTL or export to separate storage
Performance impact: - Cleanup uses indexed created_at
field for fast deletion - Runs only once per application startup - Near-zero runtime overhead
Example scenarios:
# Development: Keep last 7 days only
FLOCK_TRACE_TTL_DAYS=7
# Production: Keep last 30 days
FLOCK_TRACE_TTL_DAYS=30
# Long-term retention: Keep last 90 days
FLOCK_TRACE_TTL_DAYS=90
# No cleanup: Keep all traces forever
# FLOCK_TRACE_TTL_DAYS= (leave empty or comment out)
What Gets CapturedΒΆ
Span AttributesΒΆ
Every traced method automatically captures:
Attribute | Description | Example |
---|---|---|
class | Class name of the method | Agent , Flock , DSPyEngine |
function | Method name | execute , publish , evaluate |
module | Python module path | flock.orchestrator |
agent.name | Agent identifier (if applicable) | movie , tagline |
agent.description | Agent description | Generate movie ideas |
correlation_id | Request correlation ID | 12d0fcda-e7f7-4c96-ae8e-14ae4eca1518 |
task_id | Task identifier | task_abc123 |
result.type | Return type | list , EvalResult , Artifact |
result.length | Collection size (if applicable) | 3 |
Span Hierarchy ExampleΒΆ
Flock.publish (trace_id: ae40f0061e3f1bcfebe169191d138078)
βββ Agent.execute
βββ Agent.on_initialize
β βββ OutputUtilityComponent.on_initialize
β βββ DSPyEngine.on_initialize
βββ Agent.on_pre_consume
βββ Agent.on_pre_evaluate
βββ Agent.evaluate
β βββ DSPyEngine.evaluate
βββ Agent.on_post_evaluate
βββ Agent.on_post_publish
βββ Agent.on_terminate
All spans within the same execution share the same trace_id
, making it easy to trace a complete request flow.
Console OutputΒΆ
With auto-tracing enabled, you'll see:
2025-10-07 15:32:40 | DEBUG | [trace_id: ae40f0061e3f1bcfebe169191d138078] | [tools] | Flock.publish executed successfully
2025-10-07 15:32:40 | DEBUG | [trace_id: ae40f0061e3f1bcfebe169191d138078] | [tools] | Agent.execute executed successfully
2025-10-07 15:32:40 | DEBUG | [trace_id: ae40f0061e3f1bcfebe169191d138078] | [tools] | DSPyEngine.evaluate executed successfully
Notice how all logs share the same trace_id
, making it easy to filter and follow execution flow.
Using with GrafanaΒΆ
1. Start Grafana + Tempo (OTLP Collector)ΒΆ
# docker-compose.yml
version: '3'
services:
tempo:
image: grafana/tempo:latest
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./tempo.yaml:/etc/tempo.yaml
ports:
- "4317:4317" # OTLP gRPC
- "3200:3200" # Tempo
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
# tempo.yaml
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
storage:
trace:
backend: local
local:
path: /tmp/tempo/traces
2. Run Your Agent with OTLP ExportΒΆ
export FLOCK_AUTO_TRACE=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
python your_agent.py
3. Query in GrafanaΒΆ
- Open Grafana at
http://localhost:3000
- Add Tempo as a data source
- Query by:
trace_id
- View complete request tracecorrelation_id
- Group related agent executionsagent.name
- Filter by specific agentservice.name=flock-auto-trace
- All Flock traces
4. Create DashboardsΒΆ
Useful queries for Grafana panels:
# Agent execution duration by agent name
histogram_quantile(0.95,
rate(traces{service.name="flock-auto-trace", agent.name!=""}[5m])
)
# Error rate by agent
sum(rate(traces{service.name="flock-auto-trace", status.code="ERROR"}[5m]))
by (agent.name)
# Traces by correlation ID
traces{correlation_id="12d0fcda-e7f7-4c96-ae8e-14ae4eca1518"}
Using with JaegerΒΆ
1. Start JaegerΒΆ
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 4317:4317 \
-p 16686:16686 \
jaegertracing/all-in-one:latest
2. Run Your AgentΒΆ
export FLOCK_AUTO_TRACE=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
python your_agent.py
3. View TracesΒΆ
- Open Jaeger UI at
http://localhost:16686
- Select service:
flock-auto-trace
- Search by:
- Agent name
- Correlation ID
- Time range
Skipping Methods from TracingΒΆ
Option 1: Use Filtering (Recommended)ΒΆ
Use environment variables to control tracing at runtime without code changes:
This is preferred because: - β No code changes required - β Can be adjusted per environment (dev/staging/prod) - β Easy to add/remove without modifying source
Option 2: Use @skip_trace
DecoratorΒΆ
For methods that should never be traced in any environment:
from flock.logging.auto_trace import skip_trace
class MyComponent(AgentComponent):
def important_method(self):
# This will be traced (if class is whitelisted)
pass
@skip_trace
def noisy_helper(self):
# This will NEVER be traced, even if whitelisted
pass
When to use @skip_trace
: - Methods called thousands of times per second - Methods with sensitive data you never want logged - Internal helpers that provide no debugging value
Performance ConsiderationsΒΆ
- Overhead: Each span adds ~0.1-0.5ms overhead
- Console logging: DEBUG logs can slow down execution significantly
- DuckDB export: Minimal overhead (~0.01ms per span)
- OTLP export: Batched, minimal overhead (~0.02ms per span)
- Filtering: Filter check happens before span creation, so filtered operations have near-zero overhead
For production: - Use filtering to trace only core services (recommended) - Use DuckDB export instead of OTLP for lower overhead - Disable DEBUG console logging - Use blacklist to exclude high-frequency operations
Example production config:
export FLOCK_AUTO_TRACE=true
export FLOCK_TRACE_FILE=true # DuckDB only, no console spam
export FLOCK_TRACE_SERVICES='["agent", "flock"]' # Core services only
export FLOCK_TRACE_IGNORE='["DashboardEventCollector.set_websocket_manager"]'
TroubleshootingΒΆ
Trace IDs showing as "no-trace"ΒΆ
Cause: Telemetry not initialized
Fix: Ensure FLOCK_AUTO_TRACE=true
is set before importing flock
OTLP connection timeoutΒΆ
Cause: OTLP endpoint not reachable
Fix: Don't set OTEL_EXPORTER_OTLP_ENDPOINT
unless you have a collector running
Too verbose logsΒΆ
Cause: DEBUG level captures everything
Fix: Reduce logging or disable auto-trace for production
ArchitectureΒΆ
Auto-tracing uses a two-stage approach:
Stage 1: Method Wrapping (Compile Time)ΒΆ
- Metaclasses:
AutoTracedMeta
andTracedModelMeta
wrap all public methods at class creation time - Decorator:
@traced_and_logged
added to each public method - Applied to:
Agent
(agent.py) - viaAutoTracedMeta
Flock
(orchestrator.py) - viaAutoTracedMeta
AgentComponent
(components.py) - viaTracedModelMeta
- All their subclasses inherit the metaclass
Stage 2: Filter Check (Runtime)ΒΆ
- Filter Configuration:
TraceFilterConfig
loaded from environment variables at startup - Filter Check: Before creating spans, decorator checks
should_trace(service, operation)
- Early Exit: Filtered operations skip span creation entirely (near-zero overhead)
OTEL Span CreationΒΆ
- Context Propagation: Uses OTEL's
start_as_current_span
for parent-child relationships - Attribute Extraction: Automatically extracts agent name, correlation ID, etc. from method arguments
- Error Handling: Records exceptions and sets span status
- Output Capture: Serializes return values to JSON for debugging
Data FlowΒΆ
Method Call
β Filter Check (whitelist/blacklist)
β If filtered: Execute method directly (no tracing)
β If not filtered: Create OTEL span
β Extract attributes (class, agent.name, correlation_id, etc.)
β Execute method
β Capture output
β Set span status
β Export to DuckDB/OTLP
Why This Matters for AI DevelopmentΒΆ
When AI agents (like Claude) debug your code, they rely on printf debugging since they can't use interactive debuggers. Auto-tracing provides:
- Complete execution trace - See exactly what methods were called and in what order
- Correlation tracking - Group related operations across multiple agents
- Automatic context - No manual logging needed
- Visual debugging - View traces in Grafana/Jaeger for complex flows
This dramatically improves AI-assisted development by making execution flows transparent.
Example OutputΒΆ
$ export FLOCK_AUTO_TRACE=true && python examples/showcase/02_hello_flock.py
2025-10-07 15:32:40 | DEBUG | [trace_id: d1339d844b78a63d9a2e2b2f4f726e25] | Flock.register_agent executed successfully
2025-10-07 15:32:40 | DEBUG | [trace_id: d1339d844b78a63d9a2e2b2f4f726e25] | Flock.agent executed successfully
2025-10-07 15:32:40 | DEBUG | [trace_id: ae40f0061e3f1bcfebe169191d138078] | Flock.publish executed successfully
2025-10-07 15:32:40 | DEBUG | [trace_id: ae40f0061e3f1bcfebe169191d138078] | OutputUtilityComponent.on_initialize executed successfully
2025-10-07 15:32:40 | DEBUG | [trace_id: ae40f0061e3f1bcfebe169191d138078] | DSPyEngine.on_initialize executed successfully
...
β
Movie and tagline generated!
Notice: - Each trace has a unique trace_id
- Related operations share the same trace_id
- Method names show full context: Class.method