Cloud Run Jobs Pattern
Overview
Cloud Run Jobs provide a separate, dedicated container execution environment designed for tasks that run to completion and are not tied to an HTTP request. This is the primary strategy for background task processing in PermitProof.
Architecture Design
The architecture separates request handling from long-running work:
- Client Request: The client makes a gRPC call to the Cloud Run Service
- Service Orchestration: The gRPC service creates a Firestore tracking document and triggers a Cloud Run Job
- Job Execution: A new container instance spins up independently with full CPU allocation
- Job Processing: The job runs with full CPU and updates Firestore for progress tracking
- UI Progress Tracking: The UI subscribes to Firestore for real-time progress
Benefits:
- Full CPU allocation throughout execution
- Scalable task processing with parallel instances
- Complete isolation from request handler
- Optimal for tasks exceeding 60 seconds
Implementation
Component Structure
The implementation requires two distinct Java applications:
- gRPC Service - Request handler and job orchestrator
- Job Worker - Background task processor
Java Job Worker Application
The job worker is a standard Java application packaged as an executable JAR. It does not run a web server—it only requires a main method to execute its work.
CodeApplicabilityProcessorJobMain.java (example pattern)
public class CodeApplicabilityProcessorJobMain {
public static void main(String[] args) throws Exception {
System.out.println("Cloud Run Job started!");
// 1. Read arguments passed to the job
if (args.length < 4) {
System.err.println("Error: Missing required arguments");
System.exit(1);
}
String taskId = args[0];
String projectId = args[1];
String pageNumber = args[2];
String iccBookId = args[3];
System.out.println("Processing task: " + taskId);
// 2. Initialize services (Firestore, etc.)
TaskServiceImpl taskService = new TaskServiceImpl();
// 3. Perform the long-running work
try {
taskService.updateTaskProgress(taskId, "processing", 10, "Starting analysis...");
// Execute actual work...
CodeApplicabilityTaskExecutor.ExecutionResult result =
CodeApplicabilityTaskExecutor.executeCodeApplicabilityAnalysis(
taskId, projectId, Integer.parseInt(pageNumber),
iccBookId, null, null, null, null, null, taskService);
// 4. Mark as complete
if (result.success) {
System.out.println("Job finished successfully!");
System.exit(0);
} else {
System.err.println("Job failed: " + result.message);
System.exit(1);
}
} catch (Exception e) {
taskService.updateTaskFailed(taskId, "Job failed: " + e.getMessage());
System.err.println("Job failed: " + e.getMessage());
System.exit(1);
}
}
}
Dockerfile for the Job:
src/main/docker/Dockerfile
# Use a base image with Java
FROM openjdk:17-slim
# Set the working directory
WORKDIR /app
# Copy the compiled JAR file from your build process
COPY target/code-applicability-processor.jar app.jar
# The command to run when the container starts
# The args from the job execution will be appended here
ENTRYPOINT ["java", "-jar", "app.jar"]
Job Execution
Argument Passing and Job Initiation
Jobs are initiated from the gRPC service using the CloudRunTaskTrigger class. Arguments are passed as a list of strings.
Maven Dependency Configuration
Add this to your gRPC service's pom.xml:
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-run</artifactId>
<version>0.15.0</version>
</dependency>
gRPC Service Orchestration
The gRPC service implementation executes jobs using CloudRunTaskTrigger:
public class CodeApplicabilityServiceImpl extends CodeApplicabilityServiceGrpc {
private CloudRunTaskTrigger jobTrigger;
public CodeApplicabilityServiceImpl() {
// Initialize Cloud Run Job trigger
String env = getEnvironmentSuffix();
String projectId = System.getenv("GCP_PROJECT_ID");
String region = System.getenv("GCP_LOCATION");
this.jobTrigger = new CloudRunTaskTrigger(
projectId, region, "code-applicability-processor-" + env);
}
@Override
public void startAsyncCodeApplicabilityAnalysis(
StartCodeApplicabilityAnalysisRequest request,
StreamObserver<StartCodeApplicabilityAnalysisResponse> responseObserver) {
// Create task in Firestore
String taskId = taskService.createTask(
"code-applicability",
JsonFormat.printer().print(request),
currentUserEmail,
null
);
// Trigger Cloud Run Job
String[] jobArgs = {
taskId,
request.getArchitecturalProjectId(),
String.valueOf(request.getPageNumber()),
request.getIccBookId()
};
logger.info("🚀 Executing Cloud Run Job with args: " + String.join(", ", jobArgs));
String executionName = jobTrigger.triggerJob(taskId, Arrays.asList(jobArgs));
// Return response immediately
StartCodeApplicabilityAnalysisResponse response =
StartCodeApplicabilityAnalysisResponse.newBuilder()
.setTaskId(taskId)
.setSuccess(true)
.build();
responseObserver.onNext(response);
responseObserver.onCompleted();
}
}
The triggerJob() method passes arguments which become elements in the String[] args array in the Job's main method.
Task Parallelism
Execution Model
Cloud Run Jobs do not use a traditional queue-based worker pool model. Each job execution is the task itself. There is no persistent pool of workers pulling from a queue like RabbitMQ or Pub/Sub.
Parallel Task Processing
Cloud Run Jobs provide parallelism through the --tasks flag:
- Task Count: Specify the number of parallel container instances (e.g.,
gcloud run jobs execute my-job --tasks=50) - Concurrent Execution: All 50 instances start simultaneously
- Task Indexing: Each instance receives a unique
CLOUD_RUN_TASK_INDEXenvironment variable (values0to49)
Work Distribution Pattern
The task index can be used to partition work across parallel instances:
String taskIndex = System.getenv("CLOUD_RUN_TASK_INDEX");
int index = Integer.parseInt(taskIndex);
// Example: Process 1,000 users across 50 tasks
// Task 0: users 1-20
// Task 1: users 21-40
// ...
int usersPerTask = 20;
int startUser = (index * usersPerTask) + 1;
int endUser = startUser + usersPerTask - 1;
This approach achieves large-scale parallelism without external queue management infrastructure.
Deployment Configuration
Environment Variables
Cloud Run Job requires these environment variables:
GCP_PROJECT_ID: Google Cloud Project IDGCP_LOCATION: Cloud Run region (e.g.,us-central1)GOOGLE_APPLICATION_CREDENTIALS: For Firestore access
Job Naming Convention
Jobs are named with environment suffix for isolation:
code-applicability-processor-{env}(Code Applicability Analysis)plan-ingestion-processor-{env}(Architectural Plan Ingestion)compliance-report-processor-{env}(Compliance Report Generation)
Where {env} is one of: dev, demo, prod
Resource Configuration
Configure appropriate resources in Cloud Run Job definition:
- Memory: 2GB minimum for code applicability tasks
- CPU: 2 vCPU for optimal performance
- Timeout: 30 minutes for complex analyses
- Max Retries: 2 for transient failures
Monitoring and Debugging
Cloud Logging
Monitor job execution via Cloud Logging:
gcloud logging read "resource.type=cloud_run_job AND resource.labels.job_name=code-applicability-processor-dev" --limit 50
Task Status Tracking
Query task status via Firestore or gRPC:
import org.codetricks.construction.code.assistant.service.GetTaskStatusRequest;
import org.codetricks.construction.code.assistant.service.GetTaskStatusResponse;
GetTaskStatusRequest request = GetTaskStatusRequest.newBuilder()
.setTaskId(taskId)
.build();
GetTaskStatusResponse response = taskServiceStub.getTaskStatus(request);
Common Issues
- Job Timeout: Increase timeout in job configuration
- Permission Errors: Verify service account has required IAM roles
- Container Startup Failures: Check Docker image and entrypoint configuration
- Firestore Access: Ensure service account has
roles/datastore.user
Best Practices
- Use for Long Tasks: Reserve Cloud Run Jobs for tasks exceeding 60 seconds
- Idempotent Design: Jobs may be retried, ensure operations are idempotent
- Graceful Shutdown: Handle SIGTERM signals for clean termination
- Resource Cleanup: Release resources properly in finally blocks
- Comprehensive Logging: Log progress for debugging and monitoring