Skip to main content

Proactive Agent Implementation Guide

Quick Summary

The agent is currently reactive (asks user for information) instead of proactive (gathers information autonomously). This makes conversations inefficient.

Example:

  • Current: 3-4 message exchanges to answer "Which codes apply?"
  • Target: 1 message (agent gathers location + occupancy automatically)

Root Causes

  1. System prompt tells agent to ASK user for missing information
  2. maxSteps = 1 (default) - agent stops after one tool call
  3. No goal decomposition instructions - agent doesn't know how to chain tool calls

Implementation (Estimated: 2-3 hours)

Step 1: Update System Prompt ⭐ HIGHEST PRIORITY

File: src/main/java/org/codetricks/construction/code/assistant/service/ChatAgentService.java

Action: Replace the getSystemPrompt() method with the new proactive version.

private String getSystemPrompt() {
try {
// Load the new proactive prompt
InputStream is = getClass().getClassLoader()
.getResourceAsStream("prompts/proactive-system-prompt-v2.txt");
return new String(is.readAllBytes(), StandardCharsets.UTF_8);
} catch (Exception e) {
logger.error("Failed to load proactive prompt, using fallback", e);
return getFallbackPrompt(); // Keep old prompt as fallback
}
}

New prompt location: src/main/resources/prompts/proactive-system-prompt-v2.txt

Key changes:

  • ✅ Adds "PROACTIVE AUTONOMOUS BEHAVIOR" section
  • ✅ Provides multi-step ReAct loop instructions
  • ✅ Includes detailed workflow examples
  • ✅ Removes all "ask user" language for tool-retrievable info

Step 2: Increase maxSteps

File: src/main/java/org/codetricks/construction/code/assistant/service/ChatAgentService.java

Current (line ~107):

return LlmAgent.builder()
.name(AGENT_NAME)
.model(Model.GEMINI_2_5_FLASH.getModelName())
.generateContentConfig(...)
.includeContents(LlmAgent.IncludeContents.DEFAULT)
.description("AI assistant for PermitProof building code compliance")
.instruction(getSystemPrompt())
.tools(toolset.getTools().toArray(new BaseTool[0]))
.build();

Updated:

return LlmAgent.builder()
.name(AGENT_NAME)
.model(Model.GEMINI_2_5_FLASH.getModelName())
.generateContentConfig(...)
.includeContents(LlmAgent.IncludeContents.DEFAULT)
.description("AI assistant for PermitProof building code compliance")
.instruction(getSystemPrompt()) // ⬅️ Now loads proactive prompt
.tools(toolset.getTools().toArray(new BaseTool[0]))
.maxSteps(15) // ⬅️ CRITICAL: Allow multi-step tool chaining
.build();

Why: Default maxSteps is typically 1-3. Increasing to 15 allows:

  • GetProjectMetadata (step 1)
  • GetArchitecturalPlan (step 2)
  • GetAvailableBookInfo (step 3)
  • GetMultipleIccBookInfo (step 4)
  • Final response (step 5)

Plus safety margin for complex queries.


Testing

Test Case 1: Applicable Codes Query

Input:

User: "Which regulatory codes and standards apply to this project?"
Context: { projectId: "san-jose-multi-file3" }

Expected Behavior:

Proactive (New):

  1. Agent calls GetProjectMetadata("san-jose-multi-file3")
    • Observes: Location is "San Jose, CA"
  2. Agent calls GetArchitecturalPlan("san-jose-multi-file3")
    • Observes: Summaries mention "apartment units"
  3. Agent calls GetAvailableBookInfo()
    • Observes: California Building Code 2022 is available
  4. Agent responds:
    Based on the project location (San Jose, California) and occupancy type 
    (residential multi-family), the applicable codes are:
    - California Building Code (CBC) 2022
    - ICC A117.1 Accessibility Standards (2017)

Reactive (Old):

  1. Agent calls GetAvailableBookInfo()
  2. Agent responds:
    The following codes are available:
    - IBC 2021
    - California Building Code 2022
    - IRC 2021
    ...

    To determine which codes apply, please tell me:
    1. Project location
    2. Building type/occupancy

Validation:

  • Count tool calls: Should be 3+ (not just 1)
  • Check response: Should include specific codes (not just a list)
  • No questions: Agent should NOT ask for location or occupancy

Test Case 2: Occupancy Type Query

Input:

User: "What is the occupancy type for this project?"
Context: { projectId: "san-jose-multi-file3" }

Expected:

  1. Agent calls GetProjectMetadata (checks metadata)
  2. If not found, calls GetArchitecturalPlan
  3. Scans page summaries for occupancy clues
  4. Responds: "This is a residential multi-family (Group R-2) project based on the architectural plans showing apartment units."

Should NOT:

  • Ask user "What type of building is this?"
  • Ask user to provide occupancy information

Test Case 3: Code Violations Query

Input:

User: "Do we have any code violations?"
Context: { projectId: "san-jose-multi-file3", pageNumber: 2 }

Expected:

  1. Agent calls GetAvailableAnalysis(projectId, pageNumber)
  2. If reports exist: Calls GetPageComplianceReport
  3. Returns violations if found
  4. If no reports: Proactively determines which code applies (see Test 1), then asks permission to run analysis

Should NOT:

  • Immediately ask "Which code should I check?"
  • Skip checking for existing analysis

Metrics to Track

Before (Baseline)

Measure current performance:

-- Average messages per resolved query
SELECT
AVG(message_count) as avg_messages_per_query
FROM chat_sessions
WHERE resolved = true
AND created_at > NOW() - INTERVAL '30 days';

Expected baseline: ~3.5 messages/query

After (Target)

Target: ~1.8 messages/query (50% reduction)

Tool Call Metrics

-- Average tool calls per message
SELECT
AVG(tool_call_count) as avg_tools_per_message
FROM assistant_messages
WHERE created_at > NOW() - INTERVAL '7 days';

Baseline: ~1.2 tools/message
Target: ~2.5 tools/message


Rollout Strategy

Phase 1: Canary Test (10% of users, 1 week)

  1. Deploy proactive prompt to 10% of users
  2. Monitor metrics:
    • Messages per query
    • Tool calls per message
    • User satisfaction (survey)
  3. Check for issues:
    • Excessive tool calls (>10 per message = potential loop)
    • Token costs (should be ~1.5x current due to longer prompts)
    • Latency (should be acceptable due to parallel tool calls)

Phase 2: Gradual Rollout (50%, then 100%)

If canary metrics show improvement:

  • Week 2: 50% of users
  • Week 3: 100% of users

Phase 3: Optimization

After full rollout:

  1. Analyze common query patterns
  2. Add more proactive workflow examples to prompt
  3. Consider caching frequently-accessed data (project metadata, available codes)

Risk Mitigation

Risk 1: Infinite Tool Call Loops

Mitigation:

  • maxSteps=15 acts as circuit breaker
  • ADK should prevent infinite loops automatically
  • Monitor for messages hitting step limit

Detection:

// Add logging in ChatAgentService
if (stepCount >= 10) {
logger.warn("High step count detected: {} steps for session {}",
stepCount, sessionId);
}

Risk 2: Increased Costs

Mitigation:

  • Longer prompts = 500 more input tokens ($0.0005 per message)
  • More tool calls = same cost as before (just automated vs user-prompted)
  • Net impact: ~10-15% cost increase, offset by fewer messages

Monitoring:

// Track token usage per session
logger.info("Session {} - Input tokens: {}, Output tokens: {}",
sessionId, inputTokens, outputTokens);

Risk 3: Latency Increase

Mitigation:

  • Tool calls can run in parallel (ADK handles this)
  • Overall latency may actually decrease (1 message vs 3-4)
  • Set reasonable timeouts

Expected:

  • Before: 3 messages × 2-3 seconds = 6-9 seconds total
  • After: 1 message × 4-6 seconds = 4-6 seconds total
  • Result: ~30% faster to resolution

Rollback Plan

If issues arise:

Quick Rollback (< 5 minutes)

Option 1: Revert to old prompt

private String getSystemPrompt() {
// Temporarily disable proactive prompt
return LEGACY_PROMPT; // Old reactive prompt
}

Option 2: Feature flag

private String getSystemPrompt() {
if (featureFlags.isEnabled("proactive-agent")) {
return loadProactivePrompt();
}
return loadLegacyPrompt();
}

Gradual Rollback

If only affecting some users:

  • Reduce canary percentage
  • Investigate specific failing queries
  • Fix prompt and re-deploy

Success Criteria

After 2 weeks at 100% rollout:

Primary Metrics:

  • Messages per query: < 2.0 (from baseline ~3.5)
  • User satisfaction: > 4.2/5.0 stars
  • Tool call accuracy: > 90% (calls retrieve relevant info)

Secondary Metrics:

  • Time to resolution: < 30 seconds (from ~60 seconds)
  • Cost per query: < 115% of baseline
  • Error rate: < 2% (agent hitting maxSteps limit)

Next Steps (Optional Future Improvements)

After successful rollout:

  1. Add session-based goal tracking

    • Store decomposed goals in session state
    • Track progress across multiple messages
    • Handle complex multi-turn workflows
  2. Implement caching

    • Cache project metadata for active sessions
    • Cache available book info (changes rarely)
    • Reduce redundant tool calls
  3. Add self-correction

    • If tool call fails, try alternative approach
    • Example: If GetProjectMetadata returns no address, try extracting from plan summaries
  4. Create a planning agent

    • Decompose complex queries into explicit task lists
    • Execute tasks in optimal order
    • Handle dependencies between tasks

Code Checklist

  • Copy proactive-system-prompt-v2.txt to src/main/resources/prompts/
  • Update ChatAgentService.getSystemPrompt() to load new prompt
  • Add .maxSteps(15) to LlmAgent.builder()
  • Add logging for step count monitoring
  • Test with sample queries (see Test Cases above)
  • Deploy to canary environment
  • Monitor metrics for 1 week
  • Roll out gradually to production

Questions?

  • Will this break existing functionality? No - it only changes the prompt and allows more steps. All tools remain the same.
  • What if the agent makes wrong tool calls? The maxSteps limit prevents runaway loops, and the LLM should learn from prompt examples.
  • Can we A/B test this? Yes - use feature flags to route 50% to proactive, 50% to reactive.
  • What about token costs? Expect ~10-15% increase due to longer prompts, offset by faster resolution.

Contact

For implementation questions:

  • Technical: See docs/02-developer-playbooks/02-playbook.md
  • Testing: Run in Dev UI first (see ADK documentation)