Agent Proactivity Evaluation - Test Prompts
Purpose
This document contains a sequence of prompts used to evaluate the agent's proactive behavior before and after implementing the improvements described in IMPLEMENTATION-proactive-agent.md.
Test Scenario: User asks "Which codes apply to this project?" without providing location or occupancy information.
Expected Behavior (After Fix): Agent autonomously calls GetProjectMetadata and GetArchitecturalPlan to gather location and occupancy, then responds with applicable codes in 1 turn.
Actual Behavior (Before Fix): Agent asks user for information, requiring 6 turns to get the answer.
Test Project Context
- Project ID:
san-jose-multi-file3 - Project Name: "The Sonora Condos (multi-file)"
- Location: 1550 Technology Dr, San Jose, CA 95110 (available via
GetProjectMetadata) - Occupancy: Residential apartments (determinable from
GetArchitecturalPlanpage summaries) - Expected Answer: California Building Code 2022 + ICC A117.1
Prompt Sequence (Baseline - Before Fix)
Send these prompts in order to reproduce the reactive behavior:
Prompt 1
What can you help me with?
Expected: General capabilities overview
Prompt 2 (Key Test)
Which regulatory Codes and standards apply to this project?
Before Fix:
- Agent lists all codes
- Asks: "Please tell me the location and building type"
After Fix:
- Agent calls
GetProjectMetadata→ Gets San Jose, CA - Agent calls
GetArchitecturalPlan→ Sees "apartment units" in summaries - Agent responds: "California Building Code 2022 and ICC A117.1 apply"
Prompt 3 (User forced to provide help)
Can you lookup the address of this project?
Expected: Agent calls GetProjectMetadata and provides address
Problem: User shouldn't need to ask this - agent should have done it proactively in Prompt 2
Prompt 4 (User forced to provide more help)
Can you lookup the occupancy type?
Before Fix: Agent says it cannot find it directly, asks user to confirm
Problem: Agent should call GetArchitecturalPlan autonomously
Prompt 5 (User explicitly instructs agent how to do its job)
Can you look at the project's table of contents (files and pages) and drill down into a page that may have the occupancy type information?
Expected: Agent calls GetArchitecturalPlan, scans summaries, determines residential occupancy
Problem: User had to explicitly tell agent the exact steps to take
Prompt 6 (Finally getting the answer)
Okay, so now, can you figure which regulatory codes and standards apply?
Expected: Agent synthesizes location + occupancy → California Building Code 2022
Problem: Took 6 prompts to answer what should have been answered in Prompt 2
Success Criteria (After Fix)
When the fix is implemented, Prompt 2 alone should produce the complete answer:
User: "Which regulatory Codes and standards apply to this project?"
Agent: [Calls GetProjectMetadata → GetArchitecturalPlan → Determines codes]
Response: "Based on the project location (San Jose, California) and
occupancy type (residential multi-family apartments), the applicable
codes are:
1. California Building Code (CBC) 2022 (Title 24, Part 2)
2. ICC A117.1 Accessible and Usable Buildings and Facilities (2017)"
Metrics:
- Turns: 2 (instead of 6) = 66% reduction
- Tool calls: 3 autonomous calls (instead of asking user)
- Time to answer: ~25 seconds (instead of ~120 seconds) = 79% faster
How to Run This Evaluation
Before Fix (Baseline)
- Use current production agent with existing system prompt
- Send prompts 1-6 in sequence
- Observe: Agent requires 6 turns and constant user guidance
After Fix (Target)
- Deploy updated system prompt from
src/main/resources/prompts/proactive-system-prompt-v2.txt - Set
maxSteps=15inChatAgentService.java - Send Prompt 2 only
- Observe: Agent autonomously gathers info and provides complete answer in 1 turn
Related Documentation
- Executive Summary - High-level overview
- Detailed Analysis - Root cause and solutions
- Implementation Guide - Step-by-step instructions
- Issue #285 - GitHub tracking