Overview
Purpose
The heartbeat system enables:- Health Monitoring: Track satellite connectivity and availability
- System Metrics: Monitor CPU, memory, disk usage, and uptime
- MCP Process Tracking: Real-time visibility into running MCP server instances
- Tool Discovery Status: Track discovered tools and server availability
- Automatic Activation: Satellites become active on first heartbeat
Communication Pattern
- Interval: 30 seconds (fixed)
- Transport: HTTPS POST with Bearer token authentication
- Direction: Satellite → Backend (outbound-only, firewall-friendly)
- Timeout: 10 seconds
Heartbeat Payload Structure
TypeScript Interface
System Metrics
- Memory: From
process.memoryUsage().heapUsed - Uptime: From
process.uptime() - CPU/Disk: Placeholder (future implementation)
MCP Status Snapshot
Themcp_status field provides lightweight runtime visibility into MCP servers:
- Lightweight: Excludes tool descriptions/schemas (~8-15KB vs ~200-500KB)
- Per-User Instances: Includes
instance_idfor granular tracking - Fresh Data: Replaced on each heartbeat (no accumulation)
- Optional: Missing if satellite has no running MCP servers
Backend Processing
Endpoint
Route:POST /api/satellites/{satelliteId}/heartbeat
Authentication: Satellite API key (Bearer token)
Database Operations
1. Update Satellite Status and MCP Status- Upserts records in
satelliteProcessestable
Data Retention
Cleanup Policy:- Global satellites: Keep 1000 most recent heartbeats (~8.3 hours at 30s intervals)
- Team satellites: Keep 500 most recent heartbeats (~4.2 hours)
- Cron job: Runs every 3 minutes
Database Schema
satellites Table
Thesatellites table stores the latest MCP status from the most recent heartbeat:
- Latest Status: Query this table to get the current MCP status of a satellite
- Real-Time View: Always reflects the most recent heartbeat data
- Lightweight Query: No need to scan heartbeat history
satelliteHeartbeats Table
(satellite_id, timestamp)- Query heartbeats for specific satellite(status)- Filter by satellite status(timestamp)- Time-series queries
- Historical record of all heartbeats
- Time-series analysis of system metrics
- Audit trail for satellite connectivity
Querying Heartbeat Data
Get Current MCP Status (Recommended)
Query thesatellites table for the latest MCP status:
- Fastest query (single row lookup)
- Always current (updated on each heartbeat)
- No need to scan heartbeat history
Extract MCP Status Summary
Get Per-Instance Details
Get Historical System Metrics
Query thesatelliteHeartbeats table for system metrics over time:
Monitor Heartbeat Lag
Testing & Debugging
Enable Debug Logging
Watch for Heartbeat Logs
Success:Verify Payload Size
Expected payload size: ~13-18 KB per heartbeat- System metrics: ~500 bytes
- MCP status: ~8-15 KB (lightweight snapshot)
- Processes by team: ~5-10 KB
Manual Heartbeat Trigger
Satellites send heartbeats automatically every 30 seconds. For testing:- Start satellite
- Wait 30 seconds for first heartbeat
- Check backend logs for
POST /api/satellites/{id}/heartbeat - Query database for new record
Implementation Details
Satellite Side
File:services/satellite/src/services/heartbeat-service.ts
Key methods:
start()- Initializes 30s intervalsendHeartbeat()- Collects data and sends POST requestcollectSystemMetrics()- Gathers CPU, memory, uptimecollectStdioProcessesByTeam()- Groups stdio processes by teamcollectMcpStatusSnapshot()- Builds lightweight MCP status
RuntimeState- Process trackingUnifiedToolDiscoveryManager- Tool and server statusBackendClient- HTTP communication
Backend Side
File:services/backend/src/routes/satellites/heartbeat.ts
Processing steps:
- Validate satellite exists
- Update
satellites.last_heartbeatand status - Insert
satelliteHeartbeatsrecord - Parse and store
mcp_statusJSON - Update
satelliteProcessesif included
Troubleshooting
Heartbeat Not Being Sent
Check:- Is satellite registered? (
runtimeState.satelliteIdset?) - Backend URL correct? (
DEPLOYSTACK_BACKEND_URL) - Network connectivity? (firewall, DNS)
- API key valid? (not revoked)
Backend Rejecting Heartbeat
Check:- JSON schema validation errors in backend logs
- Satellite ID exists in database
- API key authentication header correct
Old Heartbeats Not Cleaned Up
Check:- Cleanup cron job running (
cleanupSatelliteHeartbeats) - Worker logs for errors
- Database retention limits configured
Large Payload Size
Expected: ~13-18 KB If larger: Checkmcp_status collection - may be including descriptions/schemas
Related Documentation
- Satellite Registration - Initial satellite setup
- Process Management - MCP process lifecycle
- Tool Discovery - How tools are discovered
- Event Emission - Process lifecycle events

