> ## Documentation Index
> Fetch the complete documentation index at: https://docs.deploystack.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Satellite Architecture Design

> Complete architectural overview of DeployStack Satellite - per-user MCP instance management with dual deployment support.

DeployStack Satellite is an edge worker service that manages **per-user MCP server instances** with dual deployment support: HTTP proxy for external endpoints and stdio subprocess for local MCP servers. Each team member gets their own isolated instance with merged configuration (Template + Team + User).

## Technical Overview

### Edge Worker Pattern

Satellites operate as edge workers similar to GitHub Actions runners, providing:

* **MCP Transport Protocols**: SSE, Streamable HTTP, Direct HTTP communication
* **Per-User Instance Management**: Each team member has their own MCP server instance (implemented)
* **Dual MCP Server Management**: HTTP proxy + stdio subprocess support (implemented)
* **Team and User Isolation**: Per-user process isolation with independent status tracking (implemented)
* **OAuth 2.1 Resource Server**: Token introspection with Backend for team and user context (implemented)
* **Backend Polling Communication**: Outbound-only, firewall-friendly
* **Real-Time Event System**: Immediate satellite → backend event emission with automatic batching
* **Process Lifecycle Management**: Per-user spawn, monitor, terminate with independent lifecycles (implemented)
* **Background Jobs System**: Cron-like recurring tasks with automatic error handling

## Current Implementation Architecture

### MCP SDK Transport Layer

The satellite uses the official `@modelcontextprotocol/sdk` for all MCP client communication:

```
┌─────────────────────────────────────────────────────────────────────────────────┐
│                        Official MCP SDK Implementation                         │
│                                                                                 │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │                        MCP SDK Server                                   │   │
│  │                                                                         │   │
│  │  • StreamableHTTPServerTransport    • Standard JSON-RPC handling       │   │
│  │  • Automatic session management     • Built-in error responses         │   │
│  │  • Protocol 2025-03-26 compliance   • SSE streaming support            │   │
│  └─────────────────────────────────────────────────────────────────────────┘   │
│                                                                                 │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │                     MCP Client Integration                              │   │
│  │                                                                         │   │
│  │  • StreamableHTTPClientTransport    • External server discovery        │   │
│  │  • Automatic connection cleanup     • Tool discovery caching           │   │
│  │  • Standard MCP method support      • Process communication            │   │
│  └─────────────────────────────────────────────────────────────────────────┘   │
│                                                                                 │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │                    Foundation Infrastructure                            │   │
│  │                                                                         │   │
│  │  • Fastify HTTP Server with JSON Schema validation                     │   │
│  │  • Pino structured logging with operation tracking                     │   │
│  │  • TypeScript + Webpack build system                                   │   │
│  │  • Environment configuration with .env support                        │   │
│  └─────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘
```

### MCP Transport Endpoints

**Active Endpoints:**

* `GET /mcp` - Establish SSE stream via MCP SDK
* `POST /mcp` - Send JSON-RPC messages via MCP SDK
* `DELETE /mcp` - Session termination via MCP SDK

**Transport Protocol Support:**

```
MCP Client                    Satellite (MCP SDK)
    │                            │
    │──── POST /mcp ────────────▶│  (Initialize connection)
    │                            │
    │◀─── Session headers ──────│  (Session established)
    │                            │
    │──── POST /mcp ────────────▶│  (JSON-RPC tools/list)
    │                            │
    │◀─── 4 meta-tools ─────────│  (Hierarchical router)
```

### Core SDK Components

**MCP Server Wrapper:**

* Official SDK Server integration with Fastify
* Standard MCP protocol method handlers
* Automatic session and transport management
* Integration with existing tool discovery and process management

**Client Communication:**

* StreamableHTTPClientTransport for external server communication
* Automatic connection establishment and cleanup
* Standard MCP method execution (listTools, callTool, listResources, readResource)
* Built-in error handling and retry logic

### MCP Protocol Implementation

**Supported MCP Methods:**

* `initialize` - MCP session initialization (SDK automatic)
* `notifications/initialized` - Client initialization complete
* `tools/list` - List available meta-tools (hierarchical router: 4 meta-tools)
* `tools/call` - Execute meta-tools or route to actual MCP servers
* `resources/list` - List available resources from all connected MCP servers
* `resources/templates/list` - List resource templates from all connected MCP servers
* `resources/read` - Read resource content (proxied on-demand to origin server)
* `prompts/list` - List available prompts (returns empty array)

<Info>
  **Hierarchical Router**: The satellite exposes only 4 meta-tools to MCP clients (`discover_mcp_tools`, `execute_mcp_tool`, `list_mcp_resources`, and `read_mcp_resource`) instead of all available tools and resources. This solves the MCP context window consumption problem by reducing token usage by 95%+. Resources and `_meta` metadata are also proxied through the hierarchical router for MCP Apps support. See [Hierarchical Router Implementation](/development/satellite/hierarchical-router) for details.
</Info>

For detailed information about internal tool discovery and caching, see [Tool Discovery Implementation](/development/satellite/tool-discovery).

**Error Handling:**

* Standard JSON-RPC 2.0 compliant error responses via SDK
* Automatic HTTP status code mapping
* Structured error logging with operation tracking
* Built-in session validation and error reporting

## Planned Full Architecture

### Three-Tier System Design

```
┌─────────────────────────────────────────────────────────────────────────────────┐
│                        MCP Client Layer                                        │
│                     (VS Code, Claude, etc.)                                    │
│                                                                                 │
│  Connects via: SSE, Streamable HTTP, Direct HTTP Tools                        │
└─────────────────────────────────────────────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                      Satellite Layer                                           │
│                   (Edge Processing)                                            │
│                                                                                 │
│  ┌─────────────────────────────────────────┐                                   │
│  │        Global Satellite                 │                                   │
│  │  (Operated by DeployStack Team)         │                                   │
│  │      (Serves All Teams)                 │                                   │
│  └─────────────────────────────────────────┘                                   │
│                                                                                 │
│  ┌─────────────────────────────────────────┐                                   │
│  │        Team Satellite                   │                                   │
│  │   (Customer-Deployed)                   │                                   │
│  │   (Serves Single Team)                  │                                   │
│  └─────────────────────────────────────────┘                                   │
└─────────────────────────────────────────────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                       Backend Layer                                            │
│                  (Central Management)                                          │
│                                                                                 │
│  ┌─────────────────────────────────────────────────────────────────────────┐   │
│  │                    DeployStack Backend                                  │   │
│  │                  (cloud.deploystack.io)                                │   │
│  │                                                                         │   │
│  │  • Command orchestration    • Configuration management                 │   │
│  │  • Status monitoring        • Team & role management                   │   │
│  │  • Usage analytics          • Security & compliance                    │   │
│  └─────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘
```

### Satellite Internal Architecture (Planned)

Each satellite instance will contain five core components:

```
┌─────────────────────────────────────────────────────────────────┐
│                    Satellite Instance                           │
│                                                                 │
│  ┌─────────────────┐    ┌─────────────────┐                   │
│  │  HTTP Proxy     │    │  MCP Server     │                   │
│  │    Router       │    │    Manager      │                   │
│  │                 │    │                 │                   │
│  │ • Team-aware    │    │ • Process       │                   │
│  │ • OAuth 2.1     │    │   Lifecycle     │                   │
│  │ • Load Balance  │    │ • stdio Comm    │                   │
│  └─────────────────┘    └─────────────────┘                   │
│                                                                 │
│  ┌─────────────────┐    ┌─────────────────┐                   │
│  │  Team Resource  │    │   Backend       │                   │
│  │    Manager      │    │ Communicator    │                   │
│  │                 │    │                 │                   │
│  │ • Namespaces    │    │ • HTTP Polling  │                   │
│  │ • rlimits       │    │ • Config Sync   │                   │
│  │ • Isolation     │    │ • Status Report │                   │
│  └─────────────────┘    └─────────────────┘                   │
│                                                                 │
│  ┌─────────────────────────────────────────┐                   │
│  │        Communication Manager            │                   │
│  │                                         │                   │
│  │ • JSON-RPC stdio    • HTTP Proxy       │                   │
│  │ • Process IPC       • Client Routing   │                   │
│  └─────────────────────────────────────────┘                   │
└─────────────────────────────────────────────────────────────────┘
```

## Deployment Models

### Global Satellites

**Operated by DeployStack Team:**

* **Infrastructure**: Cloud-hosted (AWS, GCP, Azure)
* **Scope**: Serve all teams with resource isolation
* **Scaling**: Auto-scaling based on demand
* **Management**: Centralized by DeployStack operations
* **Use Case**: Teams wanting shared infrastructure

**Architecture Benefits:**

* **Zero Installation**: URL-based configuration
* **Instant Availability**: No setup or deployment required
* **Automatic Updates**: Invisible to users
* **Global Scale**: Multi-region deployment

### Team Satellites

**Customer-Deployed:**

* **Infrastructure**: Customer's corporate networks
* **Scope**: Single team exclusive access
* **Scaling**: Customer-controlled resources
* **Management**: Team administrators
* **Use Case**: Internal resource access, compliance requirements

**Architecture Benefits:**

* **Internal Access**: Company databases, APIs, file systems
* **Data Sovereignty**: Data never leaves corporate network
* **Complete Control**: Customer owns infrastructure
* **Compliance Ready**: Meets enterprise security requirements

## Communication Patterns

### Client-to-Satellite Communication

**Multiple Transport Protocols:**

* **SSE (Server-Sent Events)**: Real-time streaming with session management
* **Streamable HTTP**: Chunked responses with optional sessions
* **Direct HTTP Tools**: Standard REST API calls

**Current Implementation:**

```
MCP Client                    Satellite
    │                            │
    │──── GET /sse ─────────────▶│  (Establish SSE connection)
    │                            │
    │◀─── event: endpoint ──────│  (Session URL + heartbeat)
    │                            │
    │──── POST /message ────────▶│  (JSON-RPC via session)
    │                            │
    │◀─── Response via SSE ─────│  (Stream JSON-RPC response)
```

**Session Management:**

* **Session ID**: 32-byte cryptographically secure identifier
* **Timeout**: 30-minute automatic cleanup
* **Activity Tracking**: Updated on each message
* **State Management**: Client info and initialization status

### Satellite-to-Backend Communication

**HTTP Polling Pattern:**

```
Satellite                    Backend
   │                           │
   │──── GET /api/satellites/{id}/commands ──▶│  (Poll for commands)
   │                           │
   │◀─── Commands Response ────│  (Configuration, tasks)
   │                           │
   │──── POST /api/satellites/{id}/heartbeat ─▶│  (Report status, metrics)
   │                           │
   │◀─── Acknowledgment ───────│  (Confirm receipt)
```

**Communication Features:**

* **Outbound Only**: Firewall-friendly
* **Priority-Based Polling**: Four modes (immediate/high/normal/slow) with automatic transitions
* **Command Queue**: Priority-based task processing with expiration and correlation IDs
* **Status Reporting**: Real-time health and metrics every 30 seconds
* **Configuration Sync**: Dynamic MCP server configuration updates
* **Error Recovery**: Exponential backoff with maximum 5-minute intervals
* **3-Second Response Time**: Immediate priority commands enable near real-time responses

For complete implementation details, see [Backend Polling Implementation](/development/satellite/polling).

### Real-Time Event System

The satellite emits typed events for status changes, logs, and tool metadata. Events enable real-time monitoring without polling.

**Difference from Heartbeat:**

* **Heartbeat** (every 30s): Aggregate metrics, system health, resource usage
* **Events** (immediate): Point-in-time status updates, precise timestamps

See [Event Emission](/development/satellite/event-emission) for complete event types, payloads, and batching configuration.

### Status Tracking System

The satellite tracks per-user MCP server **instance** health through a 12-state status system that drives tool availability and automatic recovery.

**Per-User Status Tracking:**

* **Status Location**: `mcpServerInstances` table (per user)
* **No Installation Status**: Status fields completely removed from `mcpServerInstallations`
* **Independent Tracking**: Each team member has independent status for each MCP server
* **User-Specific Filtering**: Users see only tools from their OWN instances that are online

**Status Values:**

* User configuration: `awaiting_user_config` (new - user hasn't configured required user-level fields)
* Instance lifecycle: `provisioning`, `command_received`, `connecting`, `discovering_tools`, `syncing_tools`
* Healthy state: `online` (tools available)
* Configuration changes: `restarting`
* Failure states: `offline`, `error`, `requires_reauth`, `permanently_failed`

**Status Integration:**

* **Tool Filtering**: Tools from user's non-online instances hidden from discovery
* **Auto-Recovery**: Offline instances auto-recover when responsive
* **Event Emission**: Status changes emitted immediately to backend with `user_id` field
* **Backend Filtering**: Instances with `awaiting_user_config` NOT sent to satellite (prevents spawn crashes)

See [Status Tracking](/development/satellite/status-tracking) for complete status lifecycle and transitions.
See [Instance Lifecycle](/development/satellite/instance-lifecycle) for per-user instance creation and management.

### Log Capture System

The satellite captures and batches two types of logs for debugging and monitoring: **server logs** (stderr output) and **request logs** (tool execution with full request/response data).

See [Log Capture](/development/satellite/log-capture) for buffering implementation, batching configuration, backend storage limits, and privacy controls.

## Security Architecture

### Current Security (No Authentication)

**Session-Based Isolation:**

* **Cryptographic Session IDs**: 32-byte secure identifiers
* **Session Timeout**: 30-minute automatic cleanup
* **Activity Tracking**: Prevents session hijacking
* **Error Handling**: Secure error responses

### Security Features (Implemented)

**Per-User Instance Isolation:**

* **Process Isolation**: Each user's instance runs in isolated process
* **Independent Lifecycle**: Terminating one user's process doesn't affect teammates
* **User-Specific Config**: Merged Template + Team + User configuration per instance
* **Status Isolation**: Each user's instance has independent status tracking

**Team and User Separation:**

* **OAuth Token Context**: Team ID AND User ID extracted from tokens
* **Instance Resolution**: Tools route to user's specific instance (not teammates)
* **Database Separation**: `mcpServerInstances` table tracks per-user instances

**Resource Management (stdio processes):**

* **nsjail Isolation**: PID, network, filesystem isolation in production
* **Resource Quotas**: virtual RAM unlimited (rlimit\_as=inf), 512MB physical RAM via cgroup when enabled, 60s CPU time, 1000 processes
* **Development Mode**: Direct spawn() without isolation for cross-platform development

**Authentication & Authorization:**

* **OAuth 2.1 Resource Server**: Backend token validation with 5-minute caching
* **User Context**: Automatic user and team resolution from tokens
* **Per-User Access Control**: Users only access their OWN instances

See [Team Isolation](/development/satellite/team-isolation) for complete implementation details.

## MCP Server Management

### Dual MCP Server Support

**stdio Subprocess Servers:**

* **Per-User Instances**: Each team member has their own process for each MCP server
* **Multi-Runtime Support**: Node.js (npx) and Python (uvx) runtimes with runtime-aware isolation
  * **Python Enhancements**: Auto-detects simple scripts vs installable packages, smart Python version selection avoids bleeding-edge versions, direct dependency installation for standalone scripts
* **Local Execution**: MCP servers as child processes with stdio communication
* **JSON-RPC Communication**: Full MCP protocol 2025-11-05 over stdin/stdout
* **Process Lifecycle**: Per-user spawn, monitor, auto-restart (max 3 attempts), terminate
* **Instance Isolation**: Processes tracked by `team_id` AND `user_id` with independent lifecycles
* **ProcessId Format**: `{server_slug}-{team_slug}-{user_slug}-{installation_id}`
* **Tool Discovery**: Automatic tool caching with per-user namespacing
* **Resource Limits**: nsjail in production with auto-detected cgroup enforcement
  * Virtual memory unlimited (rlimit\_as=inf) — Node.js WASM requires \~10GB virtual address space
  * 512MB physical memory (cgroup\_mem\_max) — active only when systemd `Delegate=yes` is configured
  * 1000 processes (cgroup\_pids\_max + rlimit\_nproc) — adequate for package managers
  * 60s CPU time limit
  * Runtime-specific cache directories: `/mcp-cache/node/{team_id}`, `/mcp-cache/python/{team_id}`
* **Development Mode**: Plain spawn() on all platforms for easy debugging
* **Runtime Examples**:
  * Node.js: Sequential Thinking (`npx @modelcontextprotocol/server-sequential-thinking`)
  * Python: DuckDuckGo (`uvx duckduckgo-mcp-server`)

**HTTP Proxy Servers:**

* **External Endpoints**: Proxy to remote MCP servers
* **Load Balancing**: Distribute requests across instances
* **Health Monitoring**: Endpoint availability checks
* **Tool Discovery**: Automatic at startup from remote endpoints

### Process Management

**Lifecycle Operations:**

```
Configuration → Spawn → Monitor → Health Check → Restart/Terminate
      │           │        │          │              │
      │           │        │          │              │
   Backend     Child     Metrics   Failure      Cleanup
   Command    Process   Collection Detection   Resources
```

**Health Monitoring:**

* **Process Health**: CPU, memory, responsiveness
* **MCP Protocol**: Tool availability, response times
* **Automatic Recovery**: Restart failed processes
* **Resource Limits**: Enforce team quotas

## Technical Implementation Details

### Current Implementation Specifications

* **Session ID Length**: 32 bytes base64url encoded
* **Session Timeout**: 30 minutes of inactivity
* **JSON-RPC Version**: 2.0 strict compliance
* **HTTP Framework**: Fastify with JSON Schema validation
* **Logging**: Pino structured logging with operation tracking
* **Error Handling**: Complete HTTP status code mapping

### Current Resource Isolation Specifications

* **Virtual Memory Limit**: unlimited (rlimit\_as=inf) — Node.js v24 WASM (undici HTTP parser) reserves \~10GB virtual address space; this is virtual, not physical RAM
* **Physical Memory Limit**: 512MB per MCP server process via cgroup — active only when satellite runs as a systemd service with `Delegate=yes`; falls back to rlimit-only otherwise
* **CPU Limit**: 60s CPU time limit
* **Process Limit**: 1000 processes per MCP server (accommodates package managers like npm, uvx)
* **Process Timeout**: 3-minute idle timeout for automatic cleanup
* **Isolation Method**: nsjail with Linux namespaces (PID, mount, UTS, IPC); cgroup v2 auto-detected at startup
* **Runtime-Aware Caching**: Separate cache directories per runtime (`/mcp-cache/node/{team_id}`, `/mcp-cache/python/{team_id}`)

### Technology Stack

* **HTTP Framework**: Fastify with @fastify/http-proxy (implemented)
* **Process Communication**: stdio JSON-RPC for local MCP servers (implemented)
* **Authentication**: OAuth 2.1 Resource Server with token introspection (implemented)
* **Per-User Instance Management**: ProcessManager with team and user tracking (implemented)
* **Logging**: Pino structured logging
* **Build System**: TypeScript + Webpack

### Development Setup

**Clone and Setup:**

```bash theme={null}
git clone https://github.com/deploystackio/deploystack.git
cd deploystack/services/satellite
npm install
cp .env.example .env
npm run dev
```

**Test MCP Transport:**

```bash theme={null}
# Test MCP connection
curl -X POST "http://localhost:3001/mcp" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":"1","method":"initialize","params":{}}'

# Test SSE streaming
curl -N -H "Accept: text/event-stream" "http://localhost:3001/mcp"
```

For testing the hierarchical router (tool discovery and execution), see [Hierarchical Router Implementation](/development/satellite/hierarchical-router).

**MCP Client Configuration:**

```json theme={null}
{
  "mcpServers": {
    "deploystack-satellite": {
      "command": "npx",
      "args": ["@modelcontextprotocol/server-fetch"],
      "env": {
        "MCP_SERVER_URL": "http://localhost:3001/mcp"
      }
    }
  }
}
```

## Implementation Status

The satellite service has completed MCP Transport Implementation and Backend Integration. Current implementation provides:

**MCP Transport Layer:**

* **Complete MCP Transport Layer**: SSE, SSE Messaging, Streamable HTTP
* **Session Management**: Cryptographically secure with automatic cleanup
* **JSON-RPC 2.0 Compliance**: Full protocol support with error handling

**Backend Integration:**

* **Command Polling Service**: Adaptive polling with three modes (normal/immediate/error)
* **Dynamic Configuration Management**: Replaces hardcoded MCP server configurations
* **Dynamic Command Resolution**: Resolves tool paths at startup (`node`, `npm`, `python3`, `uvx`, etc.) from system PATH
* **Command Processing**: HTTP MCP server management (spawn/kill/restart/health\_check)
* **Heartbeat Service**: Process status reporting and system metrics
* **Configuration Sync**: Real-time MCP server configuration updates
* **Event System**: Real-time event emission with automatic batching (13 event types including tool metadata)

**Runtime Validation:**

* **System Runtime Check**: Validates required tools exist at startup (`validateSystemRuntimes`)
* **Command Path Resolution**: Finds absolute paths using `which` with search priority for user-local installations (`~/.local/bin`, `/usr/local/bin`, `/usr/bin`)
* **Path Validation**: Security checks on resolved paths (allowed directories only, executable permissions verified)
* **Memory Caching**: Stores validated paths in `DEPLOYSTACK_COMMAND_CACHE` for runtime use

This allows the satellite to work across different installation methods (system packages, Homebrew, pip --user, custom installs).

**Foundation Infrastructure:**

* **HTTP Server**: Fastify with Swagger documentation
* **Logging System**: Pino with structured logging
* **Build Pipeline**: TypeScript compilation and bundling
* **Development Workflow**: Hot reload and code quality tools
* **Background Jobs System**: Cron-like job management for recurring tasks

For details on the background jobs system, see [Background Jobs System](/development/satellite/background-jobs).
