Real-time evaluation of agent actions using an LLM-based critic model.
This feature is highly experimental and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing.
A critic is an evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. The critic runs alongside the agent and provides:
Quality scores: Probability scores between 0.0 and 1.0 indicating predicted success
Real-time feedback: Scores computed during agent execution, not just at completion
You can use critic scores to build automated workflows, such as triggering the agent to reflect on and fix its previous solution when the critic indicates poor task performance.
When using the OpenHands LLM Provider (llm-proxy.*.all-hands.dev), the critic is automatically configured - no additional setup required.
examples/01_standalone_sdk/34_critic_example.py
Copy
Ask AI
"""Example demonstrating critic-based evaluation of agent actions.This is EXPERIMENTAL.This shows how to configure an agent with a critic to evaluate action qualityin real-time. The critic scores are displayed in the conversation visualizer.For All-Hands LLM proxy (llm-proxy.*.all-hands.dev), the critic is auto-configuredusing the same base_url with /vllm suffix and "critic" as the model name."""import osimport refrom openhands.sdk import LLM, Agent, Conversation, Toolfrom openhands.sdk.critic import APIBasedCriticfrom openhands.sdk.critic.base import CriticBasefrom openhands.tools.file_editor import FileEditorToolfrom openhands.tools.task_tracker import TaskTrackerToolfrom openhands.tools.terminal import TerminalTooldef get_required_env(name: str) -> str: value = os.getenv(name) if value: return value raise ValueError( f"Missing required environment variable: {name}. " f"Set {name} before running this example." )def get_default_critic(llm: LLM) -> CriticBase | None: """Auto-configure critic for All-Hands LLM proxy. When the LLM base_url matches `llm-proxy.*.all-hands.dev`, returns an APIBasedCritic configured with: - server_url: {base_url}/vllm - api_key: same as LLM - model_name: "critic" Returns None if base_url doesn't match or api_key is not set. """ base_url = llm.base_url api_key = llm.api_key if base_url is None or api_key is None: return None # Match: llm-proxy.{env}.all-hands.dev (e.g., staging, prod, eval) pattern = r"^https?://llm-proxy\.[^./]+\.all-hands\.dev" if not re.match(pattern, base_url): return None return APIBasedCritic( server_url=f"{base_url.rstrip('/')}/vllm", api_key=api_key, model_name="critic", )llm_api_key = get_required_env("LLM_API_KEY")llm = LLM( model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), api_key=llm_api_key, base_url=os.getenv("LLM_BASE_URL", None),)# Try auto-configuration for All-Hands proxy, fall back to explicit env varscritic = get_default_critic(llm)if critic is None: critic = APIBasedCritic( server_url=get_required_env("CRITIC_SERVER_URL"), api_key=get_required_env("CRITIC_API_KEY"), model_name=get_required_env("CRITIC_MODEL_NAME"), )# Configure agent with criticagent = Agent( llm=llm, tools=[ Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name), Tool(name=TaskTrackerTool.name), ], # Add critic to evaluate agent actions critic=critic,)cwd = os.getcwd()conversation = Conversation(agent=agent, workspace=cwd)conversation.send_message( "Create a file called GREETING.txt with a friendly greeting message.")conversation.run()print("\nAll done! Check the output above for 'Critic Score' in the visualizer.")
Running the Example
Copy
Ask AI
uv run python examples/01_standalone_sdk/34_critic_example.py
from openhands.sdk import Event, ActionEvent, MessageEventdef callback(event: Event): if isinstance(event, (ActionEvent, MessageEvent)): if event.critic_result is not None: print(f"Critic score: {event.critic_result.score:.3f}") print(f"Success: {event.critic_result.success}")conversation = Conversation(agent=agent, callbacks=[callback])