Proactive Agent Implementation Guide
Quick Summary
The agent is currently reactive (asks user for information) instead of proactive (gathers information autonomously). This makes conversations inefficient.
Example:
- Current: 3-4 message exchanges to answer "Which codes apply?"
- Target: 1 message (agent gathers location + occupancy automatically)
Root Causes
- System prompt tells agent to ASK user for missing information
- maxSteps = 1 (default) - agent stops after one tool call
- No goal decomposition instructions - agent doesn't know how to chain tool calls
Implementation (Estimated: 2-3 hours)
Step 1: Update System Prompt ⭐ HIGHEST PRIORITY
File: src/main/java/org/codetricks/construction/code/assistant/service/ChatAgentService.java
Action: Replace the getSystemPrompt() method with the new proactive version.
private String getSystemPrompt() {
try {
// Load the new proactive prompt
InputStream is = getClass().getClassLoader()
.getResourceAsStream("prompts/proactive-system-prompt-v2.txt");
return new String(is.readAllBytes(), StandardCharsets.UTF_8);
} catch (Exception e) {
logger.error("Failed to load proactive prompt, using fallback", e);
return getFallbackPrompt(); // Keep old prompt as fallback
}
}
New prompt location: src/main/resources/prompts/proactive-system-prompt-v2.txt
Key changes:
- ✅ Adds "PROACTIVE AUTONOMOUS BEHAVIOR" section
- ✅ Provides multi-step ReAct loop instructions
- ✅ Includes detailed workflow examples
- ✅ Removes all "ask user" language for tool-retrievable info
Step 2: Increase maxSteps
File: src/main/java/org/codetricks/construction/code/assistant/service/ChatAgentService.java
Current (line ~107):
return LlmAgent.builder()
.name(AGENT_NAME)
.model(Model.GEMINI_2_5_FLASH.getModelName())
.generateContentConfig(...)
.includeContents(LlmAgent.IncludeContents.DEFAULT)
.description("AI assistant for PermitProof building code compliance")
.instruction(getSystemPrompt())
.tools(toolset.getTools().toArray(new BaseTool[0]))
.build();
Updated:
return LlmAgent.builder()
.name(AGENT_NAME)
.model(Model.GEMINI_2_5_FLASH.getModelName())
.generateContentConfig(...)
.includeContents(LlmAgent.IncludeContents.DEFAULT)
.description("AI assistant for PermitProof building code compliance")
.instruction(getSystemPrompt()) // ⬅️ Now loads proactive prompt
.tools(toolset.getTools().toArray(new BaseTool[0]))
.maxSteps(15) // ⬅️ CRITICAL: Allow multi-step tool chaining
.build();
Why: Default maxSteps is typically 1-3. Increasing to 15 allows:
- GetProjectMetadata (step 1)
- GetArchitecturalPlan (step 2)
- GetAvailableBookInfo (step 3)
- GetMultipleIccBookInfo (step 4)
- Final response (step 5)
Plus safety margin for complex queries.
Testing
Test Case 1: Applicable Codes Query
Input:
User: "Which regulatory codes and standards apply to this project?"
Context: { projectId: "san-jose-multi-file3" }
Expected Behavior:
✅ Proactive (New):
- Agent calls
GetProjectMetadata("san-jose-multi-file3")- Observes: Location is "San Jose, CA"
- Agent calls
GetArchitecturalPlan("san-jose-multi-file3")- Observes: Summaries mention "apartment units"
- Agent calls
GetAvailableBookInfo()- Observes: California Building Code 2022 is available
- Agent responds:
Based on the project location (San Jose, California) and occupancy type
(residential multi-family), the applicable codes are:
- California Building Code (CBC) 2022
- ICC A117.1 Accessibility Standards (2017)
❌ Reactive (Old):
- Agent calls
GetAvailableBookInfo() - Agent responds:
The following codes are available:
- IBC 2021
- California Building Code 2022
- IRC 2021
...
To determine which codes apply, please tell me:
1. Project location
2. Building type/occupancy
Validation:
- Count tool calls: Should be 3+ (not just 1)
- Check response: Should include specific codes (not just a list)
- No questions: Agent should NOT ask for location or occupancy
Test Case 2: Occupancy Type Query
Input:
User: "What is the occupancy type for this project?"
Context: { projectId: "san-jose-multi-file3" }
Expected:
- Agent calls
GetProjectMetadata(checks metadata) - If not found, calls
GetArchitecturalPlan - Scans page summaries for occupancy clues
- Responds: "This is a residential multi-family (Group R-2) project based on the architectural plans showing apartment units."
Should NOT:
- Ask user "What type of building is this?"
- Ask user to provide occupancy information
Test Case 3: Code Violations Query
Input:
User: "Do we have any code violations?"
Context: { projectId: "san-jose-multi-file3", pageNumber: 2 }
Expected:
- Agent calls
GetAvailableAnalysis(projectId, pageNumber) - If reports exist: Calls
GetPageComplianceReport - Returns violations if found
- If no reports: Proactively determines which code applies (see Test 1), then asks permission to run analysis
Should NOT:
- Immediately ask "Which code should I check?"
- Skip checking for existing analysis
Metrics to Track
Before (Baseline)
Measure current performance:
-- Average messages per resolved query
SELECT
AVG(message_count) as avg_messages_per_query
FROM chat_sessions
WHERE resolved = true
AND created_at > NOW() - INTERVAL '30 days';
Expected baseline: ~3.5 messages/query
After (Target)
Target: ~1.8 messages/query (50% reduction)
Tool Call Metrics
-- Average tool calls per message
SELECT
AVG(tool_call_count) as avg_tools_per_message
FROM assistant_messages
WHERE created_at > NOW() - INTERVAL '7 days';
Baseline: ~1.2 tools/message
Target: ~2.5 tools/message
Rollout Strategy
Phase 1: Canary Test (10% of users, 1 week)
- Deploy proactive prompt to 10% of users
- Monitor metrics:
- Messages per query
- Tool calls per message
- User satisfaction (survey)
- Check for issues:
- Excessive tool calls (>10 per message = potential loop)
- Token costs (should be ~1.5x current due to longer prompts)
- Latency (should be acceptable due to parallel tool calls)
Phase 2: Gradual Rollout (50%, then 100%)
If canary metrics show improvement:
- Week 2: 50% of users
- Week 3: 100% of users
Phase 3: Optimization
After full rollout:
- Analyze common query patterns
- Add more proactive workflow examples to prompt
- Consider caching frequently-accessed data (project metadata, available codes)
Risk Mitigation
Risk 1: Infinite Tool Call Loops
Mitigation:
maxSteps=15acts as circuit breaker- ADK should prevent infinite loops automatically
- Monitor for messages hitting step limit
Detection:
// Add logging in ChatAgentService
if (stepCount >= 10) {
logger.warn("High step count detected: {} steps for session {}",
stepCount, sessionId);
}
Risk 2: Increased Costs
Mitigation:
- Longer prompts =
500 more input tokens ($0.0005 per message) - More tool calls = same cost as before (just automated vs user-prompted)
- Net impact: ~10-15% cost increase, offset by fewer messages
Monitoring:
// Track token usage per session
logger.info("Session {} - Input tokens: {}, Output tokens: {}",
sessionId, inputTokens, outputTokens);
Risk 3: Latency Increase
Mitigation:
- Tool calls can run in parallel (ADK handles this)
- Overall latency may actually decrease (1 message vs 3-4)
- Set reasonable timeouts
Expected:
- Before: 3 messages × 2-3 seconds = 6-9 seconds total
- After: 1 message × 4-6 seconds = 4-6 seconds total
- Result: ~30% faster to resolution
Rollback Plan
If issues arise:
Quick Rollback (< 5 minutes)
Option 1: Revert to old prompt
private String getSystemPrompt() {
// Temporarily disable proactive prompt
return LEGACY_PROMPT; // Old reactive prompt
}
Option 2: Feature flag
private String getSystemPrompt() {
if (featureFlags.isEnabled("proactive-agent")) {
return loadProactivePrompt();
}
return loadLegacyPrompt();
}
Gradual Rollback
If only affecting some users:
- Reduce canary percentage
- Investigate specific failing queries
- Fix prompt and re-deploy
Success Criteria
After 2 weeks at 100% rollout:
✅ Primary Metrics:
- Messages per query: < 2.0 (from baseline ~3.5)
- User satisfaction: > 4.2/5.0 stars
- Tool call accuracy: > 90% (calls retrieve relevant info)
✅ Secondary Metrics:
- Time to resolution: < 30 seconds (from ~60 seconds)
- Cost per query: < 115% of baseline
- Error rate: < 2% (agent hitting maxSteps limit)
Next Steps (Optional Future Improvements)
After successful rollout:
-
Add session-based goal tracking
- Store decomposed goals in session state
- Track progress across multiple messages
- Handle complex multi-turn workflows
-
Implement caching
- Cache project metadata for active sessions
- Cache available book info (changes rarely)
- Reduce redundant tool calls
-
Add self-correction
- If tool call fails, try alternative approach
- Example: If GetProjectMetadata returns no address, try extracting from plan summaries
-
Create a planning agent
- Decompose complex queries into explicit task lists
- Execute tasks in optimal order
- Handle dependencies between tasks
Code Checklist
- Copy
proactive-system-prompt-v2.txttosrc/main/resources/prompts/ - Update
ChatAgentService.getSystemPrompt()to load new prompt - Add
.maxSteps(15)toLlmAgent.builder() - Add logging for step count monitoring
- Test with sample queries (see Test Cases above)
- Deploy to canary environment
- Monitor metrics for 1 week
- Roll out gradually to production
Questions?
- Will this break existing functionality? No - it only changes the prompt and allows more steps. All tools remain the same.
- What if the agent makes wrong tool calls? The maxSteps limit prevents runaway loops, and the LLM should learn from prompt examples.
- Can we A/B test this? Yes - use feature flags to route 50% to proactive, 50% to reactive.
- What about token costs? Expect ~10-15% increase due to longer prompts, offset by faster resolution.
Contact
For implementation questions:
- Technical: See
docs/02-developer-playbooks/02-playbook.md - Testing: Run in Dev UI first (see ADK documentation)