Skip to main content

ExecutorService Fallback Pattern

Overview

The ExecutorService fallback provides in-process background task execution using Java's standard concurrency framework. This is used when Cloud Run Jobs are unavailable or for short-duration tasks where the overhead of a separate container is unnecessary.

⚠️ Important: This approach is subject to CPU throttling in Cloud Run after the gRPC response is sent. Use only as a fallback or for tasks under 60 seconds.

When This Pattern is Used

Automatic Fallback Conditions

  • Cloud Run Jobs initialization fails (missing GCP configuration)
  • Development environment without Cloud Run Jobs setup
  • Testing scenarios requiring quick iteration

Suitable Task Characteristics

  • Duration: < 60 seconds
  • Resource Usage: Low to moderate CPU/memory
  • Criticality: Non-critical or can tolerate slower execution
  • Examples: Metadata updates, quick validations, short analyses

Architecture

Thread Pool Configuration

The implementation uses Executors.newCachedThreadPool() with custom thread factory:

// From TaskServiceImpl.java
private final ExecutorService executorService = Executors.newCachedThreadPool(r -> {
Thread thread = new Thread(r);
thread.setDaemon(true); // Don't prevent JVM shutdown
thread.setName("TaskService-" + thread.getId());
return thread;
});

Thread Pool Characteristics:

  • Type: Cached thread pool (grows/shrinks dynamically)
  • Daemon Threads: Yes (allows clean shutdown)
  • Thread Naming: TaskService-{id} for debugging
  • Max Threads: Unbounded (limited by system resources)

Execution Flow

  1. gRPC Request Received: Full CPU allocation
  2. Task Created: Firestore document created with PENDING status
  3. Background Thread Submitted: executorService.submit(() -> executeWork())
  4. Response Returned: gRPC response sent immediately
  5. CPU Throttled: Cloud Run throttles CPU after response
  6. Background Work Continues: Task executes with limited CPU

Implementation

Task Service Integration

public class TaskServiceImpl extends TaskServiceGrpc.TaskServiceImplBase {

private final ExecutorService executorService = Executors.newCachedThreadPool(/*...*/);

/**
* Creates a task and optionally executes background work
*/
public String createTask(String taskType, String requestMetadata,
String currentUserEmail, Runnable backgroundTask) {
// Generate task ID
String taskId = UUID.randomUUID().toString();

// Create Firestore document
Task taskProto = Task.newBuilder()
.setTaskId(taskId)
.setTaskType(taskType)
.setStatus(Task.TaskStatus.PENDING)
.setCreatedBy(currentUserEmail)
.build();

firestore.collection("tasks").document(taskId).set(taskData).get();

// Start background task if provided
if (backgroundTask != null) {
executorService.submit(backgroundTask);
}

return taskId;
}

public ExecutorService getExecutorService() {
return executorService;
}
}

Async Service Implementation

public class CodeApplicabilityServiceImpl 
extends CodeApplicabilityServiceGrpc.CodeApplicabilityServiceImplBase {

private final TaskServiceImpl taskService;
private final ExecutorService backgroundExecutor;

@Override
public void startAsyncCodeApplicabilityAnalysis(
StartCodeApplicabilityAnalysisRequest request,
StreamObserver<StartCodeApplicabilityAnalysisResponse> responseObserver) {

// Create task in Firestore
String taskId = taskService.createTask(
"code-applicability",
JsonFormat.printer().print(request),
currentUserEmail,
null // Background task handled separately
);

// Execute in background thread (FALLBACK PATH)
executeCodeApplicabilityBackground(taskId, request);

// Return response immediately
responseObserver.onNext(response);
responseObserver.onCompleted();
}

private void executeCodeApplicabilityBackground(
String taskId, StartCodeApplicabilityAnalysisRequest request) {

backgroundExecutor.submit(() -> {
try {
logger.info("🔄 Starting background analysis for task: " + taskId);

CodeApplicabilityTaskExecutor.ExecutionResult result =
CodeApplicabilityTaskExecutor.executeCodeApplicabilityAnalysis(
taskId, request.getArchitecturalProjectId(),
request.getPageNumber(), request.getIccBookId(),
maxRelevantChapters, specificChapters, sectionIds,
minDepth, maxDepth, taskService);

if (result.success) {
logger.info("✅ Background analysis completed: " + taskId);
} else {
logger.severe("❌ Background analysis failed: " + taskId);
}

} catch (Exception e) {
logger.severe("💥 Exception in background processing: " + e.getMessage());
taskService.updateTaskFailed(taskId, "Failed: " + e.getMessage());
}
});
}
}

CPU Throttling Behavior

What Happens After gRPC Response

  1. Response Sent: responseObserver.onCompleted() returns control
  2. Cloud Run Detection: Platform detects no active HTTP handling
  3. CPU Allocation Reduced: CPU throttled to ~10% or paused entirely
  4. Background Thread Affected: Work continues but executes much slower

Performance Impact

Example: 30-second task without throttling

  • With throttling: 3-10 minutes
  • CPU utilization: Drops from 100% to 5-10%
  • Memory: Unaffected (still available)

Mitigation Strategies

  1. Keep Tasks Short: Design for < 60 seconds of work
  2. Use Cloud Run Jobs: For anything longer, use the primary strategy
  3. Monitor Task Duration: Alert on tasks exceeding expected duration
  4. Implement Timeouts: Fail tasks that take too long

Real-time Progress Tracking

Firestore Integration

Background threads update Firestore for real-time progress:

// Start processing
taskService.updateTaskProgress(taskId, "processing", 10, "Starting analysis...");

// Mid-progress update
taskService.updateTaskProgress(taskId, "processing", 50, "Analyzing sections...");

// Completion
Map<String, String> result = new HashMap<>();
result.put("sections_analyzed", "42");
taskService.updateTaskComplete(taskId, "complete", 100, "Analysis complete", result);

Frontend Subscription

The Angular frontend subscribes to Firestore changes:

this.firestore
.collection('tasks')
.doc(taskId)
.snapshotChanges()
.subscribe(snapshot => {
const task = snapshot.payload.data() as Task;
this.updateProgressUI(task);
});

Resource Management

Thread Pool Shutdown

Proper cleanup when service stops:

public void shutdown() {
if (executorService != null && !executorService.isShutdown()) {
executorService.shutdown();
try {
// Wait for tasks to complete
if (!executorService.awaitTermination(60, TimeUnit.SECONDS)) {
executorService.shutdownNow();
}
} catch (InterruptedException e) {
executorService.shutdownNow();
}
}
}

Memory Considerations

  • Thread Stack Size: Default 1MB per thread
  • Cached Thread Pool: Threads removed after 60s idle
  • Task Queuing: Unbounded queue (monitor for memory leaks)

Monitoring and Debugging

Logging Best Practices

Use consistent log prefixes for filtering:

logger.info("🔄 Starting background task: " + taskId);
logger.info("📈 Progress update: " + progress + "%");
logger.info("✅ Task completed: " + taskId);
logger.severe("❌ Task failed: " + taskId + " - " + error);

Cloud Run Logs

Filter for background task execution:

gcloud logging read "resource.type=cloud_run_revision AND textPayload=~'background'" --limit 50

Common Issues

  1. Slow Execution: CPU throttling after response sent

    • Solution: Migrate to Cloud Run Jobs
  2. Thread Starvation: Too many concurrent tasks

    • Solution: Monitor thread count, implement rate limiting
  3. Memory Leaks: Tasks not completing properly

    • Solution: Implement timeouts, proper error handling
  4. Lost Progress: Task updates not visible

    • Solution: Verify Firestore permissions, check logs

Best Practices

  1. Use as Fallback Only: Prefer Cloud Run Jobs for production
  2. Keep Tasks Short: Design for < 60 second execution
  3. Implement Timeouts: Fail tasks that exceed expected duration
  4. Monitor Performance: Track task completion times
  5. Graceful Degradation: Handle throttling gracefully with retries
  6. Comprehensive Logging: Log all state transitions for debugging
  7. Clean Shutdown: Implement proper cleanup in shutdown hooks

Migration to Cloud Run Jobs

If you find tasks consistently hitting CPU throttling:

  1. Measure actual task duration in logs
  2. If consistently > 60 seconds, migrate to Cloud Run Jobs
  3. Follow the pattern in Cloud Run Jobs
  4. Test fallback path still works for development