Background Job Queue System
The DeployStack backend includes a custom background job queue system for processing long-running tasks that cannot be completed within a typical HTTP request/response cycle. The system uses database-backed persistence with automatic retries and rate limiting.Overview
The job queue system solves the challenge of processing tasks that require:- Extended execution time - Operations taking minutes to hours
- External API dependencies - Calls to third-party services with rate limits
- Large-scale batch operations - Processing hundreds or thousands of items
- Failure resilience - Automatic retry logic for transient errors
- Progress tracking - Monitor task completion status
Architecture
The system consists of four core components working together:JobQueueService
CRUD operations for managing jobs in the database. Handles job creation, status updates, and retrieval of pending jobs.JobProcessorService
Background worker loop that polls the database every second, processes jobs sequentially, respects rate limits viascheduled_for
timestamps, and implements exponential backoff for retries.
Worker Registry
Plugin-style pattern where each worker implements theWorker
interface. Workers are registered by type (e.g., send_email
, process_csv
, sync_registry
) and execute specific job types.
Database Tables
Two tables provide persistence:queueJobs
stores individual jobs with payload and status, while queueJobBatches
tracks groups of related jobs for progress monitoring.
Core Concepts
Jobs
Individual units of work with a specific type, JSON payload, and status tracking. Jobs flow through states:pending
→ processing
→ completed
or failed
.
Workers
Execution handlers that implement theWorker
interface. Each worker is responsible for one job type and receives payload data plus context (database, logger).
Batches
Logical grouping of related jobs that enables tracking progress across multiple jobs. Useful for operations like sending 1,000 emails or processing a large CSV file.Rate Limiting
Built-in support through thescheduled_for
field. Jobs scheduled for future execution remain in pending
state until their scheduled time, preventing API rate limit violations.
When to Use Job Queue vs Events
The job queue system complements rather than replaces the Global Event Bus.
- Long-running operations (>30 seconds)
- Tasks requiring retry logic
- Operations that must survive server restarts
- Batch processing with progress tracking
- Rate-limited external API calls
- Fire-and-forget notifications
- Real-time event distribution
- Triggering multiple actions from one event
- Low-latency intra-application messaging
Creating Workers
Workers live inservices/backend/src/workers/
and implement the Worker
interface from workers/types.ts
.
Worker Interface
Basic Worker Pattern
Worker Registration
Register workers inworkers/index.ts
:
Error Handling Strategies
Retriable Errors - Throw errors for temporary failures that might succeed on retry:Worker Best Practices
- Keep Workers Stateless - No state between executions
- Inject Dependencies - Database, logger, and services through constructor
- Validate Payloads - Always validate before processing
- Use Structured Logging - Include context objects with operation, jobId, and relevant data
- Single Responsibility - One worker per job type
- Make Testable - Design for easy unit testing with mocked dependencies
Creating Jobs
From API Routes
With Rate Limiting
Batch Operations
Server Integration
The job queue integrates with the backend server lifecycle inserver.ts
within the initializeDatabaseDependentServices
function.
Initialization
The job queue system is initialized automatically after the database is ready:Graceful Shutdown
Graceful shutdown is handled in theonClose
hook to ensure current jobs complete before server shutdown:
stop()
method will:
- Stop accepting new jobs from the queue
- Wait for the current job to complete (with 30-second timeout)
- Clean up resources
Job Monitoring
Job Status Lifecycle
Jobs transition through these states:- pending - Job created, waiting to be processed
- processing - Currently executing
- completed - Executed successfully
- failed - Execution failed after all retries
Database Queries
Check job status:Common Issues
Jobs Not Processing:- Verify JobProcessorService is started
- Check worker is registered for job type
- Review server logs for errors
- Server crashed during job execution
- Manual intervention: Set status back to
pending
- System will retry after exponential backoff
- Worker throwing errors unnecessarily
- External service temporarily unavailable
- Review worker error handling logic
Database Schema
For the complete database schema, see schema.sqlite.ts in the backend directory.
Jobs Table
Batches Table
System Behavior
Processing Loop
- Poll database every 1 second for pending jobs
- Check if job’s
scheduled_for
time has passed - Lock job by setting status to
processing
- Execute worker for job type
- Update status based on result
- Implement exponential backoff for retries (1s, 2s, 4s, etc.)
Resource Usage
- CPU: <5% during normal operation (mostly waiting)
- Memory: Minimal footprint, jobs processed sequentially
- Database: Single query per second, additional writes during processing
- Non-blocking: Async processing doesn’t block main event loop
Performance Characteristics
- Throughput: 1-60 jobs per minute depending on job duration
- Latency: 1-second maximum delay before job starts
- Concurrency: Sequential processing prevents resource overload
- Scalability: Suitable for small to medium deployments
Design Decisions
Why Database-Backed?
No additional infrastructure required (Redis, message queues). Uses existing SQLite/Turso database, and jobs persist across server restarts.Why Sequential Processing?
Prevents server resource exhaustion from hundreds of concurrent operations. Simplifies rate limiting for external APIs. Adequate for most use cases (1,000 items = 15-30 minutes).Why 1-Second Polling?
Balance between responsiveness and database load. Adequate latency for background tasks. Can be adjusted if needed.Limitations
- Not suitable for sub-second latency requirements
- Single-server deployment (no distributed workers)
- No built-in job scheduling (cron-like patterns)
- Sequential processing limits throughput
Migration Path
If scaling beyond single-server becomes necessary, clear upgrade paths exist:- Redis Backend: Migrate to BullMQ for distributed processing
- PostgreSQL: Switch to pg-boss or Graphile Worker
- Cloud Queues: Move to AWS SQS, Google Cloud Tasks, etc.
Related Documentation
- Database Management - Database configuration and schema
- Global Event Bus - Event system for real-time notifications
- Logging - Logging best practices and patterns
- API Documentation - REST API endpoints and patterns