Idle Process Management
DeployStack Satellite implements intelligent idle process management for stdio subprocess MCP servers. This system automatically terminates processes that remain inactive for extended periods and respawns them on-demand, optimizing memory usage while maintaining instant availability for users.Purpose: Idle process management reduces memory consumption from constantly running MCP servers by terminating inactive processes. When a client needs a dormant server, the system automatically respawns it within 1-3 seconds, providing a balance between resource efficiency and user experience.
Overview
Idle process management works through three coordinated systems:- Idle Process Cleanup Job: Monitors running processes and terminates those exceeding idle timeout
- Dormant State Tracking: RuntimeState maintains configurations of terminated processes for quick respawning
- Automatic Respawning: ProcessManager respawns dormant processes when clients request them
Idle Detection & Termination
Idle Timeout Configuration
Processes are considered idle based on inactivity duration: Default Timeout: 180 seconds (3 minutes) Configuration:lastActivity
timestamp updated on every message sent or received- Idle duration calculated as:
now - lastActivity
- Only stdio transport processes are subject to idle termination
Idle Process Cleanup Job
The cleanup job runs automatically every 30 seconds: Operation Flow:- Retrieve all running stdio processes from ProcessManager
- Check each process against idle criteria
- Terminate idle processes and mark as dormant
- Update RuntimeState with dormant configurations
- Process status must be ‘running’ (skips ‘starting’, ‘terminating’, etc.)
- Process age must exceed spawn grace period (60 seconds)
- No active requests in flight
- Idle duration exceeds configured timeout (180 seconds default)
Spawn Grace Period
Newly spawned processes receive protection from immediate termination: Grace Period: 60 seconds (configurable) Configuration:- Processes younger than grace period cannot be marked idle
- Allows time for MCP handshake completion
- Prevents termination during tool discovery
- Ensures processes become fully operational before idle monitoring begins
Grace Period Purpose: Without the grace period, processes could be terminated during initialization, causing race conditions where the handshake completes but the process is immediately marked idle and terminated. The 60-second default provides ample time for handshake, tool discovery, and initial activity.
Termination Process
When a process exceeds the idle timeout: Steps:- Log idle duration and last activity timestamp
- Store process configuration in dormant map (RuntimeState)
- Emit
mcp.server.dormant
event to Backend - Execute graceful process termination
- Tools remain cached for fast respawn (not cleared)
- Remove from active process tracking maps
Dormant State Management
Dormant Configuration Storage
RuntimeState maintains a separate map of dormant process configurations: Stored Information:- Complete MCPServerConfig (command, args, env, installation details)
- Allows identical respawn without Backend communication
- Remains in memory until process respawns or satellite restarts
Dormant vs Active Tracking
Active Processes:- Tracked in ProcessManager maps (by ID, by name)
- Have active ProcessInfo with status, metrics, handlers
- Consume memory for process overhead and buffers
- Only configuration stored in RuntimeState
- No active process or handlers
- Minimal memory footprint (~1-2KB per config)
- Tools remain in cache for instant availability
Dormant Process Queries
RuntimeState provides methods for dormant process inspection: Query Methods:getDormantConfig(installationName)
: Retrieve specific configgetDormantCount()
: Count total dormant processesgetAllDormantConfigs()
: List all dormant configurations
- Dormant count included in heartbeat data
- Enables Backend visibility into idle process management
- Tracks dormant vs active process ratio
Automatic Respawning
Respawn Trigger
Dormant processes respawn automatically when clients request them: Trigger Points:- MCP client calls
tools/list
and process is dormant - MCP client calls
tools/call
for tool on dormant server - Any MCP request targeting dormant installation name
Respawn Process Flow
The respawn process follows the same path as initial spawn:- Respawn duration: 1-2 seconds (faster - no tool discovery needed)
- Includes handshake only (tools already cached)
- Client experiences minimal delay on first request after dormancy
Concurrent Respawn Prevention
Multiple concurrent requests to the same dormant process are handled safely: Respawn Lock Mechanism:- First request initiates respawn and stores Promise in map
- Subsequent requests await the same Promise
- All requests resolve when respawn completes
- Prevents duplicate spawning of same process
Post-Respawn Cleanup
After successful respawn: Cleanup Operations:- Remove configuration from dormant map
- Add ProcessInfo to active tracking maps
- Emit
mcp.server.respawned
event - Tools already cached (no rediscovery needed)
- Remove respawn Promise from tracking
Performance Characteristics
Memory Savings
Idle process management provides significant memory benefits: Per-Process Savings:- Active process: ~10-20MB (base Node.js + application)
- Dormant config: ~1-2KB (configuration only)
- Reduction: ~99% memory per idle process
- 100 MCP servers installed
- 10 actively used (100MB memory)
- 90 dormant (180KB memory)
- Total: ~100MB vs ~1.5GB if all active
Timing Impact
User Experience:- Active process: Instant response (~10-50ms latency)
- Dormant process first request: 1-2 second delay (faster - no tool discovery)
- Subsequent requests: Instant (process remains active)
- Process spawn: 500-1000ms
- MCP handshake: 500-1000ms
- Total: 1000-2000ms (tools already cached, no discovery needed)
Cleanup Efficiency
Job Performance:- Runs every 30 seconds
- Checks all processes: <1ms per process
- Termination overhead: ~100ms per process
- Minimal CPU impact during normal operation
Configuration Best Practices
Idle Timeout Selection
Choose timeout based on usage patterns: Short Timeout (60-120 seconds):- High memory constraints
- Predictable usage patterns
- Acceptable respawn delay
- Many infrequently used servers
- Balanced memory vs experience
- Mixed usage patterns
- Occasional bursty activity
- Recommended for most deployments
- Ample memory available
- Continuous or frequent usage
- Minimal respawn tolerance
- Mission-critical low latency
Grace Period Tuning
Adjust grace period based on environment: Shorter Grace Period (30-45 seconds):- Fast network connections
- Simple MCP servers (quick handshake)
- Aggressive memory optimization
- Recommended for production
- Accounts for npx package downloads
- Handles slow network conditions
- Prevents initialization race conditions
- Slow network environments
- Complex MCP servers (large dependencies)
- Extra safety margin
Monitoring Recommendation: Track
mcp.server.dormant
and mcp.server.respawned
events to tune idle timeout. If processes frequently dormant but respawn shortly after, increase timeout. If processes stay dormant for long periods, timeout is well-tuned.Monitoring & Observability
Event-Based Monitoring
Track idle process management through Backend events: Key Metrics:- mcp.server.dormant: Count of processes entering dormant state
- mcp.server.respawned: Count of successful respawns
- idle_duration_seconds: Time process was inactive before termination
- dormant_duration_seconds: Time process spent dormant before respawn
Log Analysis
Important log operations to monitor: Idle Detection:Heartbeat Data
Satellite heartbeat includes dormant process information: Reported Metrics:- Total active process count
- Total dormant process count
- Processes by status (running, starting, terminating, failed)
Error Handling
Respawn Failures
If respawn fails, standard error handling applies: Failure Scenarios:- Process spawn error
- Handshake timeout
- Invalid configuration
- Respawn attempt logs error
- Dormant config remains in map
- Next request triggers another respawn attempt
- Auto-restart logic applies (3 attempts max)
Termination Failures
Idle termination handles edge cases: Process Not Found:- Logs warning (non-critical)
- Process already terminated externally
- Cleanup continues normally
- SIGTERM sent first (10-second wait)
- SIGKILL sent if timeout exceeded
- Force termination ensures cleanup
Development Testing
Manual Idle Testing
Test idle process management locally: Force Idle Termination:Grace Period Testing
Verify grace period protection:Production Considerations
Memory Planning
Calculate expected memory usage: Formula:- 200 total MCP servers
- 20 actively used (20 × 15MB = 300MB)
- 180 dormant (180 × 2KB = 360KB)
- Total: ~300MB vs ~3GB without idle management
Monitoring Alerts
Configure alerts for idle process health: Alert Conditions:- High respawn rate (>10 per minute): Timeout too aggressive
- High dormant count (>80% of total): Consider longer timeout
- Respawn failures: Configuration or resource issues
- Grace period violations: System overload or timing bugs
Resource Limits
Coordinate with nsjail resource limits: Production Limits:- Per-process memory: 50MB (nsjail limit)
- System memory: Plan for peak active processes
- Dormant processes: Negligible memory impact
Related Documentation
- Process Management - Core stdio subprocess management
- Background Jobs - Job system architecture
- Event System - Event emission and tracking
- Tool Discovery - How tool caching interacts with dormancy