Skip to main content
DeployStack Satellite sends periodic heartbeats to the Backend every 30 seconds to maintain connectivity status, report system health, and provide real-time visibility into running MCP server processes.

Overview

Purpose

The heartbeat system enables:
  • Health Monitoring: Track satellite connectivity and availability
  • System Metrics: Monitor CPU, memory, disk usage, and uptime
  • MCP Process Tracking: Real-time visibility into running MCP server instances
  • Tool Discovery Status: Track discovered tools and server availability
  • Automatic Activation: Satellites become active on first heartbeat

Communication Pattern

Satellite (Every 30s)          Backend
     │                            │
     │─── POST /heartbeat ───────▶│
     │    {                        │
     │      system_metrics,        │
     │      mcp_status,            │
     │      processes_by_team      │
     │    }                        │
     │                            │
     │◀── { success: true } ──────│
     │                            │
     │                            │─── Update satellites.last_heartbeat
     │                            │─── Insert satelliteHeartbeats record
     │                            │─── Store mcp_status JSON
Characteristics:
  • Interval: 30 seconds (fixed)
  • Transport: HTTPS POST with Bearer token authentication
  • Direction: Satellite → Backend (outbound-only, firewall-friendly)
  • Timeout: 10 seconds

Heartbeat Payload Structure

TypeScript Interface

interface HeartbeatPayload {
  status: 'active' | 'degraded' | 'error';
  system_metrics: SystemMetrics;
  processes: ProcessInfo[];  // Legacy HTTP proxy processes
  processes_by_team: Record<string, StdioProcessMetric[]>;
  normalized_data?: NormalizedHeartbeatData;  // Large-scale deployments
  mcp_status?: McpStatusSnapshot;  // MCP server runtime status
  error_count: number;
  version: string;
}

System Metrics

interface SystemMetrics {
  cpu_usage_percent: number;    // 0-100
  memory_usage_mb: number;       // Heap memory in MB
  disk_usage_percent: number;    // 0-100
  uptime_seconds: number;        // Process uptime
}
Collection:
  • Memory: From process.memoryUsage().heapUsed
  • Uptime: From process.uptime()
  • CPU/Disk: Placeholder (future implementation)

MCP Status Snapshot

The mcp_status field provides lightweight runtime visibility into MCP servers:
interface McpStatusSnapshot {
  summary: {
    total_instances: number;      // Total running instances
    active_instances: number;     // Instances in 'running' state
    dormant_instances: number;    // Idle processes (cached for respawn)
    total_tools: number;          // Total discovered tools
    online_servers: number;       // Servers in 'online' state
    offline_servers: number;      // Servers in 'offline' state
    error_servers: number;        // Servers in 'error' state
  };
  instances: Array<{
    installation_id: string;      // Installation UUID
    installation_name: string;    // {server}-{team}-{user}-{id}
    instance_id: string;          // Per-user instance identifier
    user_id: string;              // User owning this instance
    team_id: string;              // Team identifier
    status: string;               // 'starting' | 'running' | 'failed'
    transport_type: string;       // 'stdio' | 'http' | 'sse'
    uptime_seconds: number;       // Instance uptime
    message_count: number;        // Total messages sent
    error_count: number;          // Total errors
    health_status: string;        // 'healthy' | 'unhealthy' | 'unknown'
    pid?: number;                 // Process ID (stdio only)
  }>;
  tool_names: Array<{
    namespaced_name: string;      // {server-slug}:{tool-name}
    server_slug: string;          // Server identifier
    transport: string;            // 'stdio' | 'http' | 'sse'
  }>;
  server_status_counts: {
    online: number;
    offline: number;
    error: number;
    requires_reauth: number;
  };
}
Key Characteristics:
  • Lightweight: Excludes tool descriptions/schemas (~8-15KB vs ~200-500KB)
  • Per-User Instances: Includes instance_id for granular tracking
  • Fresh Data: Replaced on each heartbeat (no accumulation)
  • Optional: Missing if satellite has no running MCP servers

Backend Processing

Endpoint

Route: POST /api/satellites/{satelliteId}/heartbeat Authentication: Satellite API key (Bearer token)

Database Operations

1. Update Satellite Status and MCP Status
UPDATE satellites
SET
  last_heartbeat = NOW(),
  status = 'active',
  mcp_status = :mcp_status_json
WHERE id = :satelliteId;
2. Insert Heartbeat Record
INSERT INTO satelliteHeartbeats (
  id,
  satellite_id,
  status,
  system_metrics,     -- JSON string
  process_count,
  healthy_process_count,
  error_count,
  uptime_seconds,
  version,
  timestamp
) VALUES (...);
3. Update Process Statuses (if included)
  • Upserts records in satelliteProcesses table

Data Retention

Cleanup Policy:
  • Global satellites: Keep 1000 most recent heartbeats (~8.3 hours at 30s intervals)
  • Team satellites: Keep 500 most recent heartbeats (~4.2 hours)
  • Cron job: Runs every 3 minutes

Database Schema

satellites Table

The satellites table stores the latest MCP status from the most recent heartbeat:
export const satellites = pgTable('satellites', {
  id: text('id').primaryKey(),
  name: text('name').notNull(),
  satellite_type: text('satellite_type', { enum: ['global', 'team'] }).notNull(),
  team_id: text('team_id'),
  status: text('status', { enum: ['active', 'inactive', 'maintenance', 'error'] }).notNull(),
  last_heartbeat: timestamp('last_heartbeat', { withTimezone: true }),
  mcp_status: text('mcp_status'),  // JSON: Latest MCP status (updated on each heartbeat)
  satellite_url: text('satellite_url').notNull(),
  region: text('region'),
  // ... other fields
});
Usage:
  • Latest Status: Query this table to get the current MCP status of a satellite
  • Real-Time View: Always reflects the most recent heartbeat data
  • Lightweight Query: No need to scan heartbeat history

satelliteHeartbeats Table

export const satelliteHeartbeats = pgTable('satelliteHeartbeats', {
  id: text('id').primaryKey(),
  satellite_id: text('satellite_id').notNull(),
  status: text('status', { enum: ['active', 'degraded', 'error'] }).notNull(),
  system_metrics: text('system_metrics').notNull(),  // JSON
  process_count: integer('process_count').notNull().default(0),
  healthy_process_count: integer('healthy_process_count').notNull().default(0),
  error_count: integer('error_count').notNull().default(0),
  response_time_ms: integer('response_time_ms'),
  uptime_seconds: integer('uptime_seconds'),
  version: text('version'),
  timestamp: timestamp('timestamp', { withTimezone: true }).notNull().defaultNow(),
});
Indexes:
  • (satellite_id, timestamp) - Query heartbeats for specific satellite
  • (status) - Filter by satellite status
  • (timestamp) - Time-series queries
Purpose:
  • Historical record of all heartbeats
  • Time-series analysis of system metrics
  • Audit trail for satellite connectivity

Querying Heartbeat Data

Query the satellites table for the latest MCP status:
SELECT
  id,
  name,
  status,
  last_heartbeat,
  mcp_status::json AS mcp_status
FROM satellites
WHERE id = 'sat-xxx';
Why use this?
  • Fastest query (single row lookup)
  • Always current (updated on each heartbeat)
  • No need to scan heartbeat history

Extract MCP Status Summary

SELECT
  id,
  name,
  mcp_status::json->'summary'->>'total_instances' AS total_instances,
  mcp_status::json->'summary'->>'active_instances' AS active_instances,
  mcp_status::json->'summary'->>'total_tools' AS total_tools,
  mcp_status::json->'summary'->>'online_servers' AS online_servers
FROM satellites
WHERE id = 'sat-xxx';

Get Per-Instance Details

SELECT
  id,
  name,
  jsonb_array_elements(mcp_status::jsonb->'instances') AS instance
FROM satellites
WHERE id = 'sat-xxx';

Get Historical System Metrics

Query the satelliteHeartbeats table for system metrics over time:
SELECT
  timestamp,
  system_metrics::json->'memory_usage_mb' AS memory_mb,
  system_metrics::json->'uptime_seconds' AS uptime,
  process_count,
  healthy_process_count
FROM satelliteHeartbeats
WHERE satellite_id = 'sat-xxx'
ORDER BY timestamp DESC
LIMIT 100;

Monitor Heartbeat Lag

SELECT
  id,
  name,
  last_heartbeat,
  NOW() - last_heartbeat AS lag,
  status
FROM satellites
WHERE status = 'active'
ORDER BY lag DESC;

Testing & Debugging

Enable Debug Logging

cd services/satellite
LOG_LEVEL=debug npm run dev

Watch for Heartbeat Logs

Success:
{
  "level": "debug",
  "operation": "mcp_status_snapshot_collected",
  "total_instances": 5,
  "total_tools": 42,
  "msg": "Collected MCP status snapshot for heartbeat"
}

{
  "level": "debug",
  "satelliteId": "sat-xxx",
  "msg": "Heartbeat sent successfully"
}
Failure:
{
  "level": "error",
  "error": "Request timeout",
  "msg": "Failed to send heartbeat"
}

Verify Payload Size

Expected payload size: ~13-18 KB per heartbeat
  • System metrics: ~500 bytes
  • MCP status: ~8-15 KB (lightweight snapshot)
  • Processes by team: ~5-10 KB
Check logs for large payloads (may indicate issue with data collection).

Manual Heartbeat Trigger

Satellites send heartbeats automatically every 30 seconds. For testing:
  1. Start satellite
  2. Wait 30 seconds for first heartbeat
  3. Check backend logs for POST /api/satellites/{id}/heartbeat
  4. Query database for new record

Implementation Details

Satellite Side

File: services/satellite/src/services/heartbeat-service.ts Key methods:
  • start() - Initializes 30s interval
  • sendHeartbeat() - Collects data and sends POST request
  • collectSystemMetrics() - Gathers CPU, memory, uptime
  • collectStdioProcessesByTeam() - Groups stdio processes by team
  • collectMcpStatusSnapshot() - Builds lightweight MCP status
Dependencies:
  • RuntimeState - Process tracking
  • UnifiedToolDiscoveryManager - Tool and server status
  • BackendClient - HTTP communication

Backend Side

File: services/backend/src/routes/satellites/heartbeat.ts Processing steps:
  1. Validate satellite exists
  2. Update satellites.last_heartbeat and status
  3. Insert satelliteHeartbeats record
  4. Parse and store mcp_status JSON
  5. Update satelliteProcesses if included

Troubleshooting

Heartbeat Not Being Sent

Check:
  • Is satellite registered? (runtimeState.satelliteId set?)
  • Backend URL correct? (DEPLOYSTACK_BACKEND_URL)
  • Network connectivity? (firewall, DNS)
  • API key valid? (not revoked)

Backend Rejecting Heartbeat

Check:
  • JSON schema validation errors in backend logs
  • Satellite ID exists in database
  • API key authentication header correct

Old Heartbeats Not Cleaned Up

Check:
  • Cleanup cron job running (cleanupSatelliteHeartbeats)
  • Worker logs for errors
  • Database retention limits configured

Large Payload Size

Expected: ~13-18 KB If larger: Check mcp_status collection - may be including descriptions/schemas