Agent Proactivity Improvements
Problem Statement
The PermitProof assistant is currently reactive rather than proactive. When users ask questions that require context (e.g., "Which codes apply?"), the agent asks the user for information instead of gathering it autonomously using available tools.
Example Failure Case
User: "Which regulatory Codes and standards apply to this project?"
Current Agent Behavior:
- Lists all available codes
- Asks user: "Please tell me the location and building type"
Desired Agent Behavior:
- Call
GetProjectMetadata(projectId)→ Get address (San Jose, CA) - Call
GetArchitecturalPlan(projectId)→ Scan summaries → Determine occupancy (residential) - Match location + occupancy → Return applicable codes (California Building Code, ICC A117.1)
- Provide answer WITHOUT asking user for information
Root Cause Analysis
1. System Prompt Instructs Reactive Behavior
Current Prompt (lines 247-252 in ChatAgentService.java):
c) "Do we have code violations?" or "Check for violations"
→ First: Call GetAvailableAnalysis (check existing reports)
→ If reports exist: Inform user about findings
→ If no reports: List available books (GetAvailableBookInfo)
→ Ask which code to check, or suggest based on project type ❌
→ WAIT for confirmation before running expensive analysis
The phrase "Ask which code to check" explicitly tells the agent to ask the user.
2. No Goal Decomposition Framework
The agent doesn't have instructions for:
- Breaking complex questions into sub-goals
- Identifying missing information
- Autonomously gathering that information via tool calls
- Chaining multiple tool calls to achieve a goal
3. Single-Turn Processing Model
The ChatAgentService.processMessage() method:
- Processes one user message
- Generates one assistant response
- Waits for next user input
There's no internal loop that says "keep working until the goal is achieved."
Proposed Solutions
Solution 1: Rewrite System Prompt with Proactive Instructions ⭐
Add a new section to the system prompt:
PROACTIVE BEHAVIOR - CRITICAL:
When a user asks a question, you should be PROACTIVE and gather all necessary
information YOURSELF using available tools before responding. NEVER ask the user
for information that you can retrieve via tools.
Goal Decomposition Process:
1. ANALYZE the user's question and identify what information you need
2. CHECK what information is available in CONTEXT (project ID, page number, etc.)
3. IDENTIFY what additional information you need
4. AUTONOMOUSLY GATHER that information by calling appropriate tools
5. SYNTHESIZE the results and provide a complete answer
Example - "Which regulatory codes apply to this project?"
❌ WRONG (Reactive):
→ List all codes
→ Ask user: "What is the project location and building type?"
✅ CORRECT (Proactive):
→ THINK: "I need location and occupancy type to determine applicable codes"
→ ACT: Call GetProjectMetadata(projectId)
→ OBSERVE: Address is "1550 Technology Dr, San Jose, CA 95110"
→ THINK: "Location is California, I still need occupancy type"
→ ACT: Call GetArchitecturalPlan(projectId)
→ OBSERVE: Summaries mention "apartment units" and "townhouse-style units"
→ THINK: "This is residential occupancy in California"
→ ACT: Call GetMultipleIccBookInfo for California codes
→ RESPOND: "Based on the project location (San Jose, CA) and occupancy type
(residential apartments), the applicable codes are:
- California Building Code (CBC) 2022
- ICC A117.1 Accessibility Standards"
Tool Call Chaining:
- You CAN and SHOULD make multiple tool calls in sequence to gather information
- Each tool call builds on the previous one
- Continue until you have enough information to answer the user's question
- Only ask the user for clarification if information is truly unavailable via tools
Information Sources (in priority order):
1. CONTEXT section (projectId, pageNumber, userId, etc.)
2. Tool calls (GetProjectMetadata, GetArchitecturalPlan, etc.)
3. Cached session state
4. User input (ONLY as last resort)
Update the specific guidance for "Which codes apply?":
- c) "Do we have code violations?" or "Check for violations"
- → First: Call GetAvailableAnalysis (check existing reports)
- → If reports exist: Inform user about findings
- → If no reports: List available books (GetAvailableBookInfo)
- → Ask which code to check, or suggest based on project type
- → WAIT for confirmation before running expensive analysis
+ c) "Which regulatory codes apply?" or "What codes should we use?"
+ → AUTONOMOUS WORKFLOW:
+ 1. Get project location:
+ - Call GetProjectMetadata(projectId from CONTEXT)
+ - Extract address → Determine jurisdiction
+ 2. Get occupancy type:
+ - Call GetArchitecturalPlan(projectId from CONTEXT)
+ - Scan page summaries for occupancy clues (apartments, commercial, etc.)
+ 3. Match codes:
+ - Call GetAvailableBookInfo to see available codes
+ - Match jurisdiction + occupancy → Return applicable codes
+ 4. Respond with specific codes and reasoning
+ → ONLY ask user if tools don't provide enough information
+
+ d) "Do we have code violations?" or "Check for violations"
+ → AUTONOMOUS WORKFLOW:
+ 1. Call GetAvailableAnalysis (check if already analyzed)
+ 2. If exists: Call GetPageComplianceReport → Return findings
+ 3. If not: Determine applicable code (see workflow above)
+ 4. Inform user analysis will be expensive
+ 5. WAIT for confirmation before running StartPageSectionComplianceReportTask
Solution 2: Implement Multi-Step Planning in Agent Configuration
Option A: Use ADK's Planning Features (if available)
Check if Google ADK Java has planning/loop agents:
// In ChatAgentService.initAgent()
return LlmAgent.builder()
.name(AGENT_NAME)
.model(Model.GEMINI_2_5_FLASH.getModelName())
.generateContentConfig(...)
.instruction(getSystemPrompt())
.tools(toolset.getTools().toArray(new BaseTool[0]))
.maxSteps(10) // ⬅️ INCREASE from default (usually 1)
// Consider using LoopAgent or PlanReActPlanner if available
.build();
Option B: Create a Planning Wrapper
Implement a ProactivePlanningAgent that wraps the base agent:
public class ProactivePlanningAgent {
public Flowable<Event> processWithGoalDecomposition(
String userId,
String sessionId,
String userMessage,
ChatContext context) {
// 1. Use LLM to decompose goal into sub-tasks
List<Task> tasks = decomposeGoal(userMessage, context);
// 2. Execute tasks sequentially, feeding results forward
Map<String, Object> gatheredInfo = new HashMap<>();
for (Task task : tasks) {
Object result = executeTask(task, gatheredInfo, context);
gatheredInfo.put(task.outputKey, result);
}
// 3. Synthesize final response
return generateFinalResponse(userMessage, gatheredInfo);
}
}
Solution 3: Add Task Decomposition Examples to System Prompt
Add explicit examples of multi-step reasoning:
MULTI-STEP REASONING EXAMPLES:
User: "Which regulatory codes apply to this project?"
Step 1 - Analyze Goal:
- Need: Applicable building codes
- Depends on: Project location + Occupancy type
Step 2 - Gather Location:
- Tool: GetProjectMetadata(projectId="san-jose-multi-file3")
- Result: "1550 Technology Dr, San Jose, CA 95110"
- Extract: Location = "San Jose, California"
Step 3 - Gather Occupancy:
- Tool: GetArchitecturalPlan(projectId="san-jose-multi-file3")
- Result: Pages with summaries
- Scan: Page 2 summary = "layout of apartment units"
- Extract: Occupancy = "Residential - Multi-family"
Step 4 - Match Codes:
- Tool: GetAvailableBookInfo()
- Result: List of ICC books including CBC 2022
- Logic: California jurisdiction → Use California Building Code
- Logic: Residential → Also apply ICC A117.1 (Accessibility)
Step 5 - Respond:
"Based on the project location (San Jose, California) and occupancy type
(residential multi-family), the applicable codes are:
- California Building Code (CBC) 2022
- ICC A117.1 Accessible and Usable Buildings and Facilities (2017)"
Solution 4: Implement Session-Based Goal Tracking
Add a goal tracking system:
public class GoalTracker {
private String mainGoal;
private List<SubGoal> subGoals;
private Map<String, Object> gatheredInformation;
public boolean isGoalAchieved() {
return subGoals.stream().allMatch(SubGoal::isComplete);
}
public SubGoal getNextSubGoal() {
return subGoals.stream()
.filter(g -> !g.isComplete())
.findFirst()
.orElse(null);
}
}
Store this in ADK session state so the agent can track progress across multiple internal iterations.
Implementation Priority
Phase 1: Immediate (Low Effort, High Impact) ⭐⭐⭐
- Update System Prompt (2 hours)
- Add "PROACTIVE BEHAVIOR" section
- Rewrite examples to show tool chaining
- Remove "ask user" language
- Test with sample queries
Phase 2: Medium-Term (Medium Effort, High Impact) ⭐⭐
-
Increase maxSteps (30 minutes)
- Change from 1 to 10-15 steps
- Allows agent to make multiple tool calls per user message
- Test that it doesn't cause infinite loops
-
Add Explicit Multi-Step Examples (3 hours)
- Create 5-10 detailed examples in prompt
- Cover common scenarios (codes, violations, sharing, etc.)
- Include both successful and edge-case examples
Phase 3: Long-Term (High Effort, Medium Impact) ⭐
-
Implement Goal Decomposition Framework (1-2 weeks)
- Build ProactivePlanningAgent wrapper
- Add task decomposition logic
- Integrate with existing ChatAgentService
-
Add Goal Tracking to Session State (1 week)
- Store decomposed goals in ADK session
- Track progress across tool calls
- Implement "goal achieved" detection
Testing Strategy
Test Cases for Proactive Behavior
-
Test: "Which codes apply to this project?"
- Expected: Agent calls GetProjectMetadata + GetArchitecturalPlan + GetAvailableBookInfo → Returns specific codes
- Should NOT ask user for location or occupancy
-
Test: "Do we have any violations?"
- Expected: Agent calls GetAvailableAnalysis → If exists, calls GetPageComplianceReport → Returns violations
- Should NOT ask user which code to check
-
Test: "What is the occupancy type?"
- Expected: Agent calls GetProjectMetadata (check for metadata) → If not found, calls GetArchitecturalPlan → Analyzes summaries → Returns occupancy
- Should NOT ask user directly
-
Test: "Which page covers electrical details?"
- Expected: Agent calls GetArchitecturalPlan → Scans all page titles and summaries → Returns file and page number
- Should NOT ask user to manually search
Success Metrics
-
Proactivity Rate: % of queries answered without asking user for tool-retrievable info
- Baseline: ~30% (current)
- Target: >80%
-
Tool Calls Per Query: Average number of autonomous tool calls
- Baseline: ~1.2
- Target: ~2.5-3.0
-
User Satisfaction: Qualitative feedback on agent helpfulness
- Target: "Agent finds answers without constant hand-holding"
Implementation Code
Updated System Prompt Addition
// Add this to ChatAgentService.getSystemPrompt() after line 141
PROACTIVE AUTONOMOUS BEHAVIOR - CRITICAL REQUIREMENT:
You are a PROACTIVE agent. When a user asks a question, you must AUTONOMOUSLY
gather all necessary information using available tools BEFORE responding.
NEVER ask the user for information that can be retrieved via tool calls.
ReAct Loop Process:
1. REASON: Analyze the question → Identify required information
2. ACT: Call appropriate tools to gather that information
3. OBSERVE: Examine tool results
4. REASON: Determine if you have enough information
5. If NO → ACT again (call more tools)
6. If YES → RESPOND to user
Information Retrieval Priority:
1. CONTEXT section (projectId, pageNumber, userId - already provided)
2. Tool calls (GetProjectMetadata, GetArchitecturalPlan, GetAvailableBookInfo, etc.)
3. Session state (from previous messages)
4. User input (ONLY if information is truly unavailable via tools)
Example Proactive Workflow:
User: "Which regulatory codes apply to this project?"
WRONG Approach (Reactive):
❌ Response: "I can help! Please tell me the project location and building type."
CORRECT Approach (Proactive):
✅ REASON: Need location and occupancy type to determine codes
✅ ACT: Call GetProjectMetadata(projectId from CONTEXT)
✅ OBSERVE: Address = "1550 Technology Dr, San Jose, CA 95110"
✅ REASON: Have location (California), still need occupancy type
✅ ACT: Call GetArchitecturalPlan(projectId from CONTEXT)
✅ OBSERVE: Page summaries mention "apartment units", "residential"
✅ REASON: Have all information needed (CA + Residential)
✅ ACT: Call GetAvailableBookInfo to see available codes
✅ OBSERVE: California Building Code 2022 is available
✅ RESPOND: "Based on the project location (San Jose, CA) and occupancy
(residential multi-family), the applicable codes are:
- California Building Code (CBC) 2022
- ICC A117.1 Accessibility Standards (2017)"
Tool Call Chaining:
- Make as many tool calls as needed to gather complete information
- Each tool call builds on previous results
- Continue until you can answer the user's question completely
- Do NOT stop early and ask the user for information
Common Multi-Step Scenarios:
1. Determining applicable codes:
→ GetProjectMetadata (location)
→ GetArchitecturalPlan (occupancy from summaries)
→ GetAvailableBookInfo (match codes)
2. Checking for violations:
→ GetAvailableAnalysis (check existing reports)
→ GetPageComplianceReport (retrieve violations if exists)
→ Return specific violations with code references
3. Finding specific content:
→ GetArchitecturalPlan (get all pages)
→ Search through summaries and titles
→ GetArchitecturalPlanPage (retrieve full content if needed)
Code Change in ChatAgentService.java
// Line 38: Increase max steps
return LlmAgent.builder()
.name(AGENT_NAME)
.model(Model.GEMINI_2_5_FLASH.getModelName())
.generateContentConfig(
GenerateContentConfig.builder()
.temperature(0.3F)
.thinkingConfig(thinkingConfig)
.build())
.includeContents(LlmAgent.IncludeContents.DEFAULT)
.description("AI assistant for PermitProof building code compliance")
.instruction(getSystemPrompt()) // ⬅️ Updated prompt with proactive instructions
.tools(toolset.getTools().toArray(new BaseTool[0]))
.maxSteps(15) // ⬅️ INCREASE from default to allow multi-step tool chaining
.build();
Expected Impact
With these changes:
Before (Reactive):
User: "Which codes apply?"
Agent: Lists all codes → Asks user for location and type
[3 user messages needed to get answer]
After (Proactive):
User: "Which codes apply?"
Agent:
→ Calls GetProjectMetadata
→ Calls GetArchitecturalPlan
→ Matches to codes
→ Returns: "California Building Code 2022 applies..."
[1 user message needed - agent does the work]
Result: 3x reduction in conversation turns, significantly improved user experience.