Skip to main content

Agent Proactivity Evaluation - Test Prompts

Purpose

This document contains a sequence of prompts used to evaluate the agent's proactive behavior before and after implementing the improvements described in IMPLEMENTATION-proactive-agent.md.

Test Scenario: User asks "Which codes apply to this project?" without providing location or occupancy information.

Expected Behavior (After Fix): Agent autonomously calls GetProjectMetadata and GetArchitecturalPlan to gather location and occupancy, then responds with applicable codes in 1 turn.

Actual Behavior (Before Fix): Agent asks user for information, requiring 6 turns to get the answer.


Test Project Context

  • Project ID: san-jose-multi-file3
  • Project Name: "The Sonora Condos (multi-file)"
  • Location: 1550 Technology Dr, San Jose, CA 95110 (available via GetProjectMetadata)
  • Occupancy: Residential apartments (determinable from GetArchitecturalPlan page summaries)
  • Expected Answer: California Building Code 2022 + ICC A117.1

Prompt Sequence (Baseline - Before Fix)

Send these prompts in order to reproduce the reactive behavior:

Prompt 1

What can you help me with?

Expected: General capabilities overview


Prompt 2 (Key Test)

Which regulatory Codes and standards apply to this project?

Before Fix:

  • Agent lists all codes
  • Asks: "Please tell me the location and building type"

After Fix:

  • Agent calls GetProjectMetadata → Gets San Jose, CA
  • Agent calls GetArchitecturalPlan → Sees "apartment units" in summaries
  • Agent responds: "California Building Code 2022 and ICC A117.1 apply"

Prompt 3 (User forced to provide help)

Can you lookup the address of this project?

Expected: Agent calls GetProjectMetadata and provides address

Problem: User shouldn't need to ask this - agent should have done it proactively in Prompt 2


Prompt 4 (User forced to provide more help)

Can you lookup the occupancy type?

Before Fix: Agent says it cannot find it directly, asks user to confirm

Problem: Agent should call GetArchitecturalPlan autonomously


Prompt 5 (User explicitly instructs agent how to do its job)

Can you look at the project's table of contents (files and pages) and drill down into a page that may have the occupancy type information?

Expected: Agent calls GetArchitecturalPlan, scans summaries, determines residential occupancy

Problem: User had to explicitly tell agent the exact steps to take


Prompt 6 (Finally getting the answer)

Okay, so now, can you figure which regulatory codes and standards apply?

Expected: Agent synthesizes location + occupancy → California Building Code 2022

Problem: Took 6 prompts to answer what should have been answered in Prompt 2


Success Criteria (After Fix)

When the fix is implemented, Prompt 2 alone should produce the complete answer:

User: "Which regulatory Codes and standards apply to this project?"

Agent: [Calls GetProjectMetadata → GetArchitecturalPlan → Determines codes]

Response: "Based on the project location (San Jose, California) and
occupancy type (residential multi-family apartments), the applicable
codes are:

1. California Building Code (CBC) 2022 (Title 24, Part 2)
2. ICC A117.1 Accessible and Usable Buildings and Facilities (2017)"

Metrics:

  • Turns: 2 (instead of 6) = 66% reduction
  • Tool calls: 3 autonomous calls (instead of asking user)
  • Time to answer: ~25 seconds (instead of ~120 seconds) = 79% faster

How to Run This Evaluation

Before Fix (Baseline)

  1. Use current production agent with existing system prompt
  2. Send prompts 1-6 in sequence
  3. Observe: Agent requires 6 turns and constant user guidance

After Fix (Target)

  1. Deploy updated system prompt from src/main/resources/prompts/proactive-system-prompt-v2.txt
  2. Set maxSteps=15 in ChatAgentService.java
  3. Send Prompt 2 only
  4. Observe: Agent autonomously gathers info and provides complete answer in 1 turn