Evaluators 🧑⚖️¶
Evaluators are responsible for turning an agent's declarative specification into actual work. They live inside the agent instance (agent.evaluator
) and must implement the async method:
The returned dict is validated against the agent's output
signature.
1. Built-in Evaluators¶
Class | Location | Description |
---|---|---|
DeclarativeEvaluator |
flock.evaluators.declarative |
Default. Generates a prompt from the agent's signatures and calls litellm . |
DSPyEvaluator |
flock.evaluators.dspy |
Integrates with DSPy for structured prompting & optimisation. |
RuleEvaluator |
flock.evaluators.rule |
Pure-Python rules engine (no LLM). Useful for testing. |
2. Configuring Evaluators¶
All evaluators accept a *Config
dataclass (Pydantic model) with relevant fields. Example for Declarative:
from flock.evaluators.declarative import DeclarativeEvaluator, DeclarativeEvaluatorConfig
config = DeclarativeEvaluatorConfig(
model="anthropic/claude-3-opus",
temperature=0.2,
max_tokens=2048,
stream=True,
include_thought_process=False,
use_cache=True,
)
evaluator = DeclarativeEvaluator(name="default", config=config)
You can then inject the evaluator when you instantiate the agent or replace it later:
3. Writing a Custom Evaluator¶
- Subclass
FlockEvaluator
and optionally declare aFlockEvaluatorConfig
. - Implement
async evaluate(self, agent, inputs, tools)
. - Register with the registry (optional) for serialization:
from flock.core import flock_component
@flock_component
class MyCoolEvaluator(FlockEvaluator):
async def evaluate(self, agent, inputs, tools):
# do stuff…
return {"answer": "42"}
4. Best Practices¶
- Respect the
model
selected on the agent unless your evaluator has a reason to override. - Validate inputs early; raise
ValueError
to triggeron_error
hooks. - Make expensive network calls async to keep the event loop responsive.
- Return only the fields declared in the output signature (extra fields are dropped).
➡️ Continue to Modules to see how to augment agent behaviour.