Proactive Agent Implementation Guide

Quick Summary

The agent is currently reactive (asks user for information) instead of proactive (gathers information autonomously). This makes conversations inefficient.

Example:

Current: 3-4 message exchanges to answer "Which codes apply?"
Target: 1 message (agent gathers location + occupancy automatically)

Root Causes

System prompt tells agent to ASK user for missing information
maxSteps = 1 (default) - agent stops after one tool call
No goal decomposition instructions - agent doesn't know how to chain tool calls

Implementation (Estimated: 2-3 hours)

Step 1: Update System Prompt ⭐ HIGHEST PRIORITY

File: src/main/java/org/codetricks/construction/code/assistant/service/ChatAgentService.java

Action: Replace the getSystemPrompt() method with the new proactive version.

private String getSystemPrompt() {
  try {
    // Load the new proactive prompt
    InputStream is = getClass().getClassLoader()
        .getResourceAsStream("prompts/proactive-system-prompt-v2.txt");
    return new String(is.readAllBytes(), StandardCharsets.UTF_8);
  } catch (Exception e) {
    logger.error("Failed to load proactive prompt, using fallback", e);
    return getFallbackPrompt(); // Keep old prompt as fallback
  }
}

New prompt location: src/main/resources/prompts/proactive-system-prompt-v2.txt

Key changes:

✅ Adds "PROACTIVE AUTONOMOUS BEHAVIOR" section
✅ Provides multi-step ReAct loop instructions
✅ Includes detailed workflow examples
✅ Removes all "ask user" language for tool-retrievable info

Step 2: Increase maxSteps

File: src/main/java/org/codetricks/construction/code/assistant/service/ChatAgentService.java

Current (line ~107):

return LlmAgent.builder()
    .name(AGENT_NAME)
    .model(Model.GEMINI_2_5_FLASH.getModelName())
    .generateContentConfig(...)
    .includeContents(LlmAgent.IncludeContents.DEFAULT)
    .description("AI assistant for PermitProof building code compliance")
    .instruction(getSystemPrompt())
    .tools(toolset.getTools().toArray(new BaseTool[0]))
    .build();

Updated:

return LlmAgent.builder()
    .name(AGENT_NAME)
    .model(Model.GEMINI_2_5_FLASH.getModelName())
    .generateContentConfig(...)
    .includeContents(LlmAgent.IncludeContents.DEFAULT)
    .description("AI assistant for PermitProof building code compliance")
    .instruction(getSystemPrompt())  // ⬅️ Now loads proactive prompt
    .tools(toolset.getTools().toArray(new BaseTool[0]))
    .maxSteps(15)  // ⬅️ CRITICAL: Allow multi-step tool chaining
    .build();

Why: Default maxSteps is typically 1-3. Increasing to 15 allows:

GetProjectMetadata (step 1)
GetArchitecturalPlan (step 2)
GetAvailableBookInfo (step 3)
GetMultipleIccBookInfo (step 4)
Final response (step 5)

Plus safety margin for complex queries.

Testing

Test Case 1: Applicable Codes Query

Input:

User: "Which regulatory codes and standards apply to this project?"
Context: { projectId: "san-jose-multi-file3" }

Expected Behavior:

✅ Proactive (New):

Agent calls GetProjectMetadata("san-jose-multi-file3")
- Observes: Location is "San Jose, CA"
Agent calls GetArchitecturalPlan("san-jose-multi-file3")
- Observes: Summaries mention "apartment units"
Agent calls GetAvailableBookInfo()
- Observes: California Building Code 2022 is available

Agent responds:

Based on the project location (San Jose, California) and occupancy type 
(residential multi-family), the applicable codes are:
- California Building Code (CBC) 2022
- ICC A117.1 Accessibility Standards (2017)

❌ Reactive (Old):

Agent calls GetAvailableBookInfo()

Agent responds:

The following codes are available:
- IBC 2021
- California Building Code 2022
- IRC 2021
...

To determine which codes apply, please tell me:
1. Project location
2. Building type/occupancy

Validation:

Count tool calls: Should be 3+ (not just 1)
Check response: Should include specific codes (not just a list)
No questions: Agent should NOT ask for location or occupancy

Test Case 2: Occupancy Type Query

Input:

User: "What is the occupancy type for this project?"
Context: { projectId: "san-jose-multi-file3" }

Expected:

Agent calls GetProjectMetadata (checks metadata)
If not found, calls GetArchitecturalPlan
Scans page summaries for occupancy clues
Responds: "This is a residential multi-family (Group R-2) project based on the architectural plans showing apartment units."

Should NOT:

Ask user "What type of building is this?"
Ask user to provide occupancy information

Test Case 3: Code Violations Query

Input:

User: "Do we have any code violations?"
Context: { projectId: "san-jose-multi-file3", pageNumber: 2 }

Expected:

Agent calls GetAvailableAnalysis(projectId, pageNumber)
If reports exist: Calls GetPageComplianceReport
Returns violations if found
If no reports: Proactively determines which code applies (see Test 1), then asks permission to run analysis

Should NOT:

Immediately ask "Which code should I check?"
Skip checking for existing analysis

Metrics to Track

Before (Baseline)

Measure current performance:

-- Average messages per resolved query
SELECT 
  AVG(message_count) as avg_messages_per_query
FROM chat_sessions
WHERE resolved = true
  AND created_at > NOW() - INTERVAL '30 days';

Expected baseline: ~3.5 messages/query

After (Target)

Target: ~1.8 messages/query (50% reduction)

Tool Call Metrics

-- Average tool calls per message
SELECT 
  AVG(tool_call_count) as avg_tools_per_message
FROM assistant_messages
WHERE created_at > NOW() - INTERVAL '7 days';

Baseline: ~1.2 tools/message
Target: ~2.5 tools/message

Rollout Strategy

Phase 1: Canary Test (10% of users, 1 week)

Deploy proactive prompt to 10% of users
Monitor metrics:
- Messages per query
- Tool calls per message
- User satisfaction (survey)
Check for issues:
- Excessive tool calls (>10 per message = potential loop)
- Token costs (should be ~1.5x current due to longer prompts)
- Latency (should be acceptable due to parallel tool calls)

Phase 2: Gradual Rollout (50%, then 100%)

If canary metrics show improvement:

Week 2: 50% of users
Week 3: 100% of users

Phase 3: Optimization

After full rollout:

Analyze common query patterns
Add more proactive workflow examples to prompt
Consider caching frequently-accessed data (project metadata, available codes)

Risk Mitigation

Risk 1: Infinite Tool Call Loops

Mitigation:

maxSteps=15 acts as circuit breaker
ADK should prevent infinite loops automatically
Monitor for messages hitting step limit

Detection:

// Add logging in ChatAgentService
if (stepCount >= 10) {
  logger.warn("High step count detected: {} steps for session {}", 
              stepCount, sessionId);
}

Risk 2: Increased Costs

Mitigation:

Longer prompts = ~~500 more input tokens (~~$0.0005 per message)
More tool calls = same cost as before (just automated vs user-prompted)
Net impact: ~10-15% cost increase, offset by fewer messages

Monitoring:

// Track token usage per session
logger.info("Session {} - Input tokens: {}, Output tokens: {}", 
            sessionId, inputTokens, outputTokens);

Risk 3: Latency Increase

Mitigation:

Tool calls can run in parallel (ADK handles this)
Overall latency may actually decrease (1 message vs 3-4)
Set reasonable timeouts

Expected:

Before: 3 messages × 2-3 seconds = 6-9 seconds total
After: 1 message × 4-6 seconds = 4-6 seconds total
Result: ~30% faster to resolution

Rollback Plan

If issues arise:

Quick Rollback (< 5 minutes)

Option 1: Revert to old prompt

private String getSystemPrompt() {
  // Temporarily disable proactive prompt
  return LEGACY_PROMPT;  // Old reactive prompt
}

Option 2: Feature flag

private String getSystemPrompt() {
  if (featureFlags.isEnabled("proactive-agent")) {
    return loadProactivePrompt();
  }
  return loadLegacyPrompt();
}

Gradual Rollback

If only affecting some users:

Reduce canary percentage
Investigate specific failing queries
Fix prompt and re-deploy

Success Criteria

After 2 weeks at 100% rollout:

✅ Primary Metrics:

Messages per query: < 2.0 (from baseline ~3.5)
User satisfaction: > 4.2/5.0 stars
Tool call accuracy: > 90% (calls retrieve relevant info)

✅ Secondary Metrics:

Time to resolution: < 30 seconds (from ~60 seconds)
Cost per query: < 115% of baseline
Error rate: < 2% (agent hitting maxSteps limit)

Next Steps (Optional Future Improvements)

After successful rollout:

Add session-based goal tracking
- Store decomposed goals in session state
- Track progress across multiple messages
- Handle complex multi-turn workflows
Implement caching
- Cache project metadata for active sessions
- Cache available book info (changes rarely)
- Reduce redundant tool calls
Add self-correction
- If tool call fails, try alternative approach
- Example: If GetProjectMetadata returns no address, try extracting from plan summaries
Create a planning agent
- Decompose complex queries into explicit task lists
- Execute tasks in optimal order
- Handle dependencies between tasks

Code Checklist

Copy proactive-system-prompt-v2.txt to src/main/resources/prompts/
Update ChatAgentService.getSystemPrompt() to load new prompt
Add .maxSteps(15) to LlmAgent.builder()
Add logging for step count monitoring
Test with sample queries (see Test Cases above)
Deploy to canary environment
Monitor metrics for 1 week
Roll out gradually to production

Questions?

Will this break existing functionality? No - it only changes the prompt and allows more steps. All tools remain the same.
What if the agent makes wrong tool calls? The maxSteps limit prevents runaway loops, and the LLM should learn from prompt examples.
Can we A/B test this? Yes - use feature flags to route 50% to proactive, 50% to reactive.
What about token costs? Expect ~10-15% increase due to longer prompts, offset by faster resolution.

Contact

For implementation questions:

Technical: See docs/02-developer-playbooks/02-playbook.md
Testing: Run in Dev UI first (see ADK documentation)

Quick Summary​

Root Causes​

Implementation (Estimated: 2-3 hours)​

Step 1: Update System Prompt ⭐ HIGHEST PRIORITY​

Step 2: Increase maxSteps​

Testing​

Test Case 1: Applicable Codes Query​

Test Case 2: Occupancy Type Query​

Test Case 3: Code Violations Query​

Metrics to Track​

Before (Baseline)​

After (Target)​

Tool Call Metrics​

Rollout Strategy​

Phase 1: Canary Test (10% of users, 1 week)​

Phase 2: Gradual Rollout (50%, then 100%)​

Phase 3: Optimization​

Risk Mitigation​

Risk 1: Infinite Tool Call Loops​

Risk 2: Increased Costs​

Risk 3: Latency Increase​

Rollback Plan​

Quick Rollback (< 5 minutes)​

Gradual Rollback​

Success Criteria​

Next Steps (Optional Future Improvements)​

Code Checklist​

Questions?​

Contact​

Quick Summary

Root Causes

Implementation (Estimated: 2-3 hours)

Step 1: Update System Prompt ⭐ HIGHEST PRIORITY

Step 2: Increase maxSteps

Testing

Test Case 1: Applicable Codes Query

Test Case 2: Occupancy Type Query

Test Case 3: Code Violations Query

Metrics to Track

Before (Baseline)

After (Target)

Tool Call Metrics

Rollout Strategy

Phase 1: Canary Test (10% of users, 1 week)

Phase 2: Gradual Rollout (50%, then 100%)

Phase 3: Optimization

Risk Mitigation

Risk 1: Infinite Tool Call Loops

Risk 2: Increased Costs

Risk 3: Latency Increase

Rollback Plan

Quick Rollback (< 5 minutes)

Gradual Rollback

Success Criteria

Next Steps (Optional Future Improvements)

Code Checklist

Questions?

Contact