Evaluator
Assess content quality using customizable evaluation metrics
The Evaluator block uses AI to score and assess content quality based on metrics you define. Perfect for quality control, A/B testing, and ensuring your AI outputs meet specific standards.
What You Can Evaluate
AI-Generated Content: Score chatbot responses, generated articles, or marketing copy
User Input: Evaluate customer feedback, survey responses, or form submissions
Content Quality: Assess clarity, accuracy, relevance, and tone
Performance Metrics: Track improvements over time with consistent scoring
A/B Testing: Compare different approaches with objective metrics
Configuration Options
Evaluation Metrics
Define custom metrics to evaluate content against. Each metric includes:
- Name: A short identifier for the metric
- Description: A detailed explanation of what the metric measures
- Range: The numeric range for scoring (e.g., 1-5, 0-10)
Example metrics:
Accuracy (1-5): How factually accurate is the content?
Clarity (1-5): How clear and understandable is the content?
Relevance (1-5): How relevant is the content to the original query?
Content
The content to be evaluated. This can be:
- Directly provided in the block configuration
- Connected from another block's output (typically an Agent block)
- Dynamically generated during workflow execution
Model Selection
Choose an AI model to perform the evaluation:
OpenAI: GPT-4o, o1, o3, o4-mini, gpt-4.1 Anthropic: Claude 3.7 Sonnet Google: Gemini 2.5 Pro, Gemini 2.0 Flash Other Providers: Groq, Cerebras, xAI, DeepSeek Local Models: Any model running on Ollama
Recommendation: Use models with strong reasoning capabilities like GPT-4o or Claude 3.7 Sonnet for more accurate evaluations.
API Key
Your API key for the selected LLM provider. This is securely stored and used for authentication.
How It Works
- The Evaluator block takes the provided content and your custom metrics
- It generates a specialized prompt that instructs the LLM to evaluate the content
- The prompt includes clear guidelines on how to score each metric
- The LLM evaluates the content and returns numeric scores for each metric
- The Evaluator block formats these scores as structured output for use in your workflow
Inputs and Outputs
Inputs
- Content: The text or structured data to evaluate
- Metrics: Custom evaluation criteria with scoring ranges
- Model Settings: LLM provider and parameters
Outputs
- Content: A summary of the evaluation
- Model: The model used for evaluation
- Tokens: Usage statistics
- Metric Scores: Numeric scores for each defined metric
Example Usage
Here's an example of how an Evaluator block might be configured for assessing customer service responses:
# Example Evaluator Configuration
metrics:
- name: Empathy
description: How well does the response acknowledge and address the customer's emotional state?
range:
min: 1
max: 5
- name: Solution
description: How effectively does the response solve the customer's problem?
range:
min: 1
max: 5
- name: Clarity
description: How clear and easy to understand is the response?
range:
min: 1
max: 5
model: Anthropic/claude-3-opus
Best Practices
- Use specific metric descriptions: Clearly define what each metric measures to get more accurate evaluations
- Choose appropriate ranges: Select scoring ranges that provide enough granularity without being overly complex
- Connect with Agent blocks: Use Evaluator blocks to assess Agent block outputs and create feedback loops
- Use consistent metrics: For comparative analysis, maintain consistent metrics across similar evaluations
- Combine multiple metrics: Use several metrics to get a comprehensive evaluation