Agent Proactivity Improvements

Problem Statement

The PermitProof assistant is currently reactive rather than proactive. When users ask questions that require context (e.g., "Which codes apply?"), the agent asks the user for information instead of gathering it autonomously using available tools.

Example Failure Case

User: "Which regulatory Codes and standards apply to this project?"

Current Agent Behavior:

Lists all available codes
Asks user: "Please tell me the location and building type"

Desired Agent Behavior:

Call GetProjectMetadata(projectId) → Get address (San Jose, CA)
Call GetArchitecturalPlan(projectId) → Scan summaries → Determine occupancy (residential)
Match location + occupancy → Return applicable codes (California Building Code, ICC A117.1)
Provide answer WITHOUT asking user for information

Root Cause Analysis

1. System Prompt Instructs Reactive Behavior

Current Prompt (lines 247-252 in ChatAgentService.java):

c) "Do we have code violations?" or "Check for violations"
   → First: Call GetAvailableAnalysis (check existing reports)
   → If reports exist: Inform user about findings
   → If no reports: List available books (GetAvailableBookInfo)
   → Ask which code to check, or suggest based on project type  ❌
   → WAIT for confirmation before running expensive analysis

The phrase "Ask which code to check" explicitly tells the agent to ask the user.

2. No Goal Decomposition Framework

The agent doesn't have instructions for:

Breaking complex questions into sub-goals
Identifying missing information
Autonomously gathering that information via tool calls
Chaining multiple tool calls to achieve a goal

3. Single-Turn Processing Model

The ChatAgentService.processMessage() method:

Processes one user message
Generates one assistant response
Waits for next user input

There's no internal loop that says "keep working until the goal is achieved."

Proposed Solutions

Solution 1: Rewrite System Prompt with Proactive Instructions ⭐

Add a new section to the system prompt:

PROACTIVE BEHAVIOR - CRITICAL:

When a user asks a question, you should be PROACTIVE and gather all necessary 
information YOURSELF using available tools before responding. NEVER ask the user 
for information that you can retrieve via tools.

Goal Decomposition Process:
1. ANALYZE the user's question and identify what information you need
2. CHECK what information is available in CONTEXT (project ID, page number, etc.)
3. IDENTIFY what additional information you need
4. AUTONOMOUSLY GATHER that information by calling appropriate tools
5. SYNTHESIZE the results and provide a complete answer

Example - "Which regulatory codes apply to this project?"

❌ WRONG (Reactive):
   → List all codes
   → Ask user: "What is the project location and building type?"

✅ CORRECT (Proactive):
   → THINK: "I need location and occupancy type to determine applicable codes"
   → ACT: Call GetProjectMetadata(projectId) 
   → OBSERVE: Address is "1550 Technology Dr, San Jose, CA 95110"
   → THINK: "Location is California, I still need occupancy type"
   → ACT: Call GetArchitecturalPlan(projectId)
   → OBSERVE: Summaries mention "apartment units" and "townhouse-style units"
   → THINK: "This is residential occupancy in California"
   → ACT: Call GetMultipleIccBookInfo for California codes
   → RESPOND: "Based on the project location (San Jose, CA) and occupancy type 
              (residential apartments), the applicable codes are:
              - California Building Code (CBC) 2022
              - ICC A117.1 Accessibility Standards"

Tool Call Chaining:
- You CAN and SHOULD make multiple tool calls in sequence to gather information
- Each tool call builds on the previous one
- Continue until you have enough information to answer the user's question
- Only ask the user for clarification if information is truly unavailable via tools

Information Sources (in priority order):
1. CONTEXT section (projectId, pageNumber, userId, etc.)
2. Tool calls (GetProjectMetadata, GetArchitecturalPlan, etc.)
3. Cached session state
4. User input (ONLY as last resort)

Update the specific guidance for "Which codes apply?":

- c) "Do we have code violations?" or "Check for violations"
-    → First: Call GetAvailableAnalysis (check existing reports)
-    → If reports exist: Inform user about findings
-    → If no reports: List available books (GetAvailableBookInfo)
-    → Ask which code to check, or suggest based on project type
-    → WAIT for confirmation before running expensive analysis

+ c) "Which regulatory codes apply?" or "What codes should we use?"
+    → AUTONOMOUS WORKFLOW:
+      1. Get project location:
+         - Call GetProjectMetadata(projectId from CONTEXT)
+         - Extract address → Determine jurisdiction
+      2. Get occupancy type:
+         - Call GetArchitecturalPlan(projectId from CONTEXT)
+         - Scan page summaries for occupancy clues (apartments, commercial, etc.)
+      3. Match codes:
+         - Call GetAvailableBookInfo to see available codes
+         - Match jurisdiction + occupancy → Return applicable codes
+      4. Respond with specific codes and reasoning
+    → ONLY ask user if tools don't provide enough information
+
+ d) "Do we have code violations?" or "Check for violations"
+    → AUTONOMOUS WORKFLOW:
+      1. Call GetAvailableAnalysis (check if already analyzed)
+      2. If exists: Call GetPageComplianceReport → Return findings
+      3. If not: Determine applicable code (see workflow above)
+      4. Inform user analysis will be expensive
+      5. WAIT for confirmation before running StartPageSectionComplianceReportTask

Solution 2: Implement Multi-Step Planning in Agent Configuration

Option A: Use ADK's Planning Features (if available)

Check if Google ADK Java has planning/loop agents:

// In ChatAgentService.initAgent()
return LlmAgent.builder()
    .name(AGENT_NAME)
    .model(Model.GEMINI_2_5_FLASH.getModelName())
    .generateContentConfig(...)
    .instruction(getSystemPrompt())
    .tools(toolset.getTools().toArray(new BaseTool[0]))
    .maxSteps(10)  // ⬅️ INCREASE from default (usually 1)
    // Consider using LoopAgent or PlanReActPlanner if available
    .build();

Option B: Create a Planning Wrapper

Implement a ProactivePlanningAgent that wraps the base agent:

public class ProactivePlanningAgent {
  
  public Flowable<Event> processWithGoalDecomposition(
      String userId, 
      String sessionId, 
      String userMessage, 
      ChatContext context) {
    
    // 1. Use LLM to decompose goal into sub-tasks
    List<Task> tasks = decomposeGoal(userMessage, context);
    
    // 2. Execute tasks sequentially, feeding results forward
    Map<String, Object> gatheredInfo = new HashMap<>();
    for (Task task : tasks) {
      Object result = executeTask(task, gatheredInfo, context);
      gatheredInfo.put(task.outputKey, result);
    }
    
    // 3. Synthesize final response
    return generateFinalResponse(userMessage, gatheredInfo);
  }
}

Solution 3: Add Task Decomposition Examples to System Prompt

Add explicit examples of multi-step reasoning:

MULTI-STEP REASONING EXAMPLES:

User: "Which regulatory codes apply to this project?"

Step 1 - Analyze Goal:
  - Need: Applicable building codes
  - Depends on: Project location + Occupancy type
  
Step 2 - Gather Location:
  - Tool: GetProjectMetadata(projectId="san-jose-multi-file3")
  - Result: "1550 Technology Dr, San Jose, CA 95110"
  - Extract: Location = "San Jose, California"
  
Step 3 - Gather Occupancy:
  - Tool: GetArchitecturalPlan(projectId="san-jose-multi-file3")
  - Result: Pages with summaries
  - Scan: Page 2 summary = "layout of apartment units"
  - Extract: Occupancy = "Residential - Multi-family"
  
Step 4 - Match Codes:
  - Tool: GetAvailableBookInfo()
  - Result: List of ICC books including CBC 2022
  - Logic: California jurisdiction → Use California Building Code
  - Logic: Residential → Also apply ICC A117.1 (Accessibility)
  
Step 5 - Respond:
  "Based on the project location (San Jose, California) and occupancy type 
   (residential multi-family), the applicable codes are:
   - California Building Code (CBC) 2022
   - ICC A117.1 Accessible and Usable Buildings and Facilities (2017)"

Solution 4: Implement Session-Based Goal Tracking

Add a goal tracking system:

public class GoalTracker {
  private String mainGoal;
  private List<SubGoal> subGoals;
  private Map<String, Object> gatheredInformation;
  
  public boolean isGoalAchieved() {
    return subGoals.stream().allMatch(SubGoal::isComplete);
  }
  
  public SubGoal getNextSubGoal() {
    return subGoals.stream()
        .filter(g -> !g.isComplete())
        .findFirst()
        .orElse(null);
  }
}

Store this in ADK session state so the agent can track progress across multiple internal iterations.

Implementation Priority

Phase 1: Immediate (Low Effort, High Impact) ⭐⭐⭐

Update System Prompt (2 hours)
- Add "PROACTIVE BEHAVIOR" section
- Rewrite examples to show tool chaining
- Remove "ask user" language
- Test with sample queries

Phase 2: Medium-Term (Medium Effort, High Impact) ⭐⭐

Increase maxSteps (30 minutes)
- Change from 1 to 10-15 steps
- Allows agent to make multiple tool calls per user message
- Test that it doesn't cause infinite loops
Add Explicit Multi-Step Examples (3 hours)
- Create 5-10 detailed examples in prompt
- Cover common scenarios (codes, violations, sharing, etc.)
- Include both successful and edge-case examples

Phase 3: Long-Term (High Effort, Medium Impact) ⭐

Implement Goal Decomposition Framework (1-2 weeks)
- Build ProactivePlanningAgent wrapper
- Add task decomposition logic
- Integrate with existing ChatAgentService
Add Goal Tracking to Session State (1 week)
- Store decomposed goals in ADK session
- Track progress across tool calls
- Implement "goal achieved" detection

Testing Strategy

Test Cases for Proactive Behavior

Test: "Which codes apply to this project?"
- Expected: Agent calls GetProjectMetadata + GetArchitecturalPlan + GetAvailableBookInfo → Returns specific codes
- Should NOT ask user for location or occupancy
Test: "Do we have any violations?"
- Expected: Agent calls GetAvailableAnalysis → If exists, calls GetPageComplianceReport → Returns violations
- Should NOT ask user which code to check
Test: "What is the occupancy type?"
- Expected: Agent calls GetProjectMetadata (check for metadata) → If not found, calls GetArchitecturalPlan → Analyzes summaries → Returns occupancy
- Should NOT ask user directly
Test: "Which page covers electrical details?"
- Expected: Agent calls GetArchitecturalPlan → Scans all page titles and summaries → Returns file and page number
- Should NOT ask user to manually search

Success Metrics

Proactivity Rate: % of queries answered without asking user for tool-retrievable info
- Baseline: ~30% (current)
- Target: >80%
Tool Calls Per Query: Average number of autonomous tool calls
- Baseline: ~1.2
- Target: ~2.5-3.0
User Satisfaction: Qualitative feedback on agent helpfulness
- Target: "Agent finds answers without constant hand-holding"

Implementation Code

Updated System Prompt Addition

// Add this to ChatAgentService.getSystemPrompt() after line 141

PROACTIVE AUTONOMOUS BEHAVIOR - CRITICAL REQUIREMENT:

You are a PROACTIVE agent. When a user asks a question, you must AUTONOMOUSLY 
gather all necessary information using available tools BEFORE responding. 

NEVER ask the user for information that can be retrieved via tool calls.

ReAct Loop Process:
1. REASON: Analyze the question → Identify required information
2. ACT: Call appropriate tools to gather that information  
3. OBSERVE: Examine tool results
4. REASON: Determine if you have enough information
5. If NO → ACT again (call more tools)
6. If YES → RESPOND to user

Information Retrieval Priority:
1. CONTEXT section (projectId, pageNumber, userId - already provided)
2. Tool calls (GetProjectMetadata, GetArchitecturalPlan, GetAvailableBookInfo, etc.)
3. Session state (from previous messages)
4. User input (ONLY if information is truly unavailable via tools)

Example Proactive Workflow:

User: "Which regulatory codes apply to this project?"

WRONG Approach (Reactive):
❌ Response: "I can help! Please tell me the project location and building type."

CORRECT Approach (Proactive):
✅ REASON: Need location and occupancy type to determine codes
✅ ACT: Call GetProjectMetadata(projectId from CONTEXT)
✅ OBSERVE: Address = "1550 Technology Dr, San Jose, CA 95110"
✅ REASON: Have location (California), still need occupancy type
✅ ACT: Call GetArchitecturalPlan(projectId from CONTEXT)
✅ OBSERVE: Page summaries mention "apartment units", "residential"
✅ REASON: Have all information needed (CA + Residential)
✅ ACT: Call GetAvailableBookInfo to see available codes
✅ OBSERVE: California Building Code 2022 is available
✅ RESPOND: "Based on the project location (San Jose, CA) and occupancy 
            (residential multi-family), the applicable codes are:
            - California Building Code (CBC) 2022
            - ICC A117.1 Accessibility Standards (2017)"

Tool Call Chaining:
- Make as many tool calls as needed to gather complete information
- Each tool call builds on previous results
- Continue until you can answer the user's question completely
- Do NOT stop early and ask the user for information

Common Multi-Step Scenarios:

1. Determining applicable codes:
   → GetProjectMetadata (location)
   → GetArchitecturalPlan (occupancy from summaries)
   → GetAvailableBookInfo (match codes)

2. Checking for violations:
   → GetAvailableAnalysis (check existing reports)
   → GetPageComplianceReport (retrieve violations if exists)
   → Return specific violations with code references

3. Finding specific content:
   → GetArchitecturalPlan (get all pages)
   → Search through summaries and titles
   → GetArchitecturalPlanPage (retrieve full content if needed)

Code Change in ChatAgentService.java

// Line 38: Increase max steps
return LlmAgent.builder()
    .name(AGENT_NAME)
    .model(Model.GEMINI_2_5_FLASH.getModelName())
    .generateContentConfig(
        GenerateContentConfig.builder()
            .temperature(0.3F)
            .thinkingConfig(thinkingConfig)
            .build())
    .includeContents(LlmAgent.IncludeContents.DEFAULT)
    .description("AI assistant for PermitProof building code compliance")
    .instruction(getSystemPrompt())  // ⬅️ Updated prompt with proactive instructions
    .tools(toolset.getTools().toArray(new BaseTool[0]))
    .maxSteps(15)  // ⬅️ INCREASE from default to allow multi-step tool chaining
    .build();

Expected Impact

With these changes:

Before (Reactive):

User: "Which codes apply?"
Agent: Lists all codes → Asks user for location and type
[3 user messages needed to get answer]

After (Proactive):

User: "Which codes apply?"
Agent: 
  → Calls GetProjectMetadata 
  → Calls GetArchitecturalPlan
  → Matches to codes
  → Returns: "California Building Code 2022 applies..."
[1 user message needed - agent does the work]

Result: 3x reduction in conversation turns, significantly improved user experience.

Problem Statement​

Example Failure Case​

Root Cause Analysis​

1. System Prompt Instructs Reactive Behavior​

2. No Goal Decomposition Framework​

3. Single-Turn Processing Model​

Proposed Solutions​

Solution 1: Rewrite System Prompt with Proactive Instructions ⭐​

Solution 2: Implement Multi-Step Planning in Agent Configuration​

Solution 3: Add Task Decomposition Examples to System Prompt​

Solution 4: Implement Session-Based Goal Tracking​

Implementation Priority​

Phase 1: Immediate (Low Effort, High Impact) ⭐⭐⭐​

Phase 2: Medium-Term (Medium Effort, High Impact) ⭐⭐​

Phase 3: Long-Term (High Effort, Medium Impact) ⭐​

Testing Strategy​

Test Cases for Proactive Behavior​

Success Metrics​

Implementation Code​

Updated System Prompt Addition​

Code Change in ChatAgentService.java​

Expected Impact​

Before (Reactive):​

After (Proactive):​