Evaluators 🧑‍⚖️¶

Evaluators are responsible for turning an agent's declarative specification into actual work. They live inside the agent instance (agent.evaluator) and must implement the async method:

async def evaluate(self, agent: FlockAgent, inputs: dict, tools: list[Callable]) -> dict:  ...

The returned dict is validated against the agent's output signature.

1. Built-in Evaluators¶

Class	Location	Description
`DeclarativeEvaluator`	`flock.evaluators.declarative`	Default. Generates a prompt from the agent's signatures and calls `litellm`.
`DSPyEvaluator`	`flock.evaluators.dspy`	Integrates with DSPy for structured prompting & optimisation.
`RuleEvaluator`	`flock.evaluators.rule`	Pure-Python rules engine (no LLM). Useful for testing.

2. Configuring Evaluators¶

All evaluators accept a *Config dataclass (Pydantic model) with relevant fields. Example for Declarative:

from flock.evaluators.declarative import DeclarativeEvaluator, DeclarativeEvaluatorConfig

config = DeclarativeEvaluatorConfig(
    model="anthropic/claude-3-opus",
    temperature=0.2,
    max_tokens=2048,
    stream=True,
    include_thought_process=False,
    use_cache=True,
)

evaluator = DeclarativeEvaluator(name="default", config=config)

You can then inject the evaluator when you instantiate the agent or replace it later:

agent.evaluator = evaluator  # hot-swap!

3. Writing a Custom Evaluator¶

Subclass FlockEvaluator and optionally declare a FlockEvaluatorConfig.
Implement async evaluate(self, agent, inputs, tools).
Register with the registry (optional) for serialization:

from flock.core import flock_component

@flock_component
class MyCoolEvaluator(FlockEvaluator):
    async def evaluate(self, agent, inputs, tools):
        # do stuff…
        return {"answer": "42"}

4. Best Practices¶

Respect the model selected on the agent unless your evaluator has a reason to override.
Validate inputs early; raise ValueError to trigger on_error hooks.
Make expensive network calls async to keep the event loop responsive.
Return only the fields declared in the output signature (extra fields are dropped).

➡️ Continue to Modules to see how to augment agent behaviour.