HTML Sanitization

Overview

HTML sanitization is a critical security measure that protects DeployStack from Cross-Site Scripting (XSS) attacks by ensuring user-provided content is safe to render in HTML contexts. The backend uses a centralized three-function approach powered by the sanitize-html library. The sanitization system provides:

XSS Prevention: Blocks malicious scripts, event handlers, and dangerous protocols
Content Preservation: Maintains legitimate formatting while removing threats
Removal Tracking: Monitors sanitization impact for security insights
Single Source of Truth: All sanitization logic centralized in one utility

Centralized Sanitization Utility

All HTML sanitization functions are centralized in services/backend/src/utils/sanitization.ts. Benefits:

No duplicate sanitization code across the codebase
Consistent security behavior application-wide
Easier to audit and maintain
Comprehensive test coverage (44 tests)

The backend recently migrated from isomorphic-dompurify to sanitize-html, eliminating ~20MB of dependencies and fixing module errors on remote servers while maintaining equivalent security.

The Three Sanitization Functions

sanitizeText()

Purpose: Escape all HTML entities for plain text rendering. Function Signature:

sanitizeText(text: string): string

Use Cases:

Error messages in OAuth callbacks
User input displayed in HTML attributes
Any text that must display literally (not as HTML)

Security: Converts dangerous characters to HTML entities:

<script> → <script>
Escapes: &, <, >, ", '

Example:

import { sanitizeText } from '../../../utils/sanitization';

// OAuth error callback
const errorMsg = query.error_description || query.error;
const html = `<p><strong>Error:</strong> ${sanitizeText(errorMsg)}</p>`;

sanitizeForEmail()

Purpose: Sanitize user-provided text for email rendering with newline preservation. Function Signature:

sanitizeForEmail(text: string): string

Allowed Tags: Only <br /> tags for line breaks. Defense-in-Depth Approach:

Normalize line endings (\r\n → \n)
Trim whitespace
Escape HTML entities (primary defense)
Convert newlines to <br /> tags
Sanitize with sanitize-html (secondary defense)

Behavior:

Input:  "Hello\nWorld"
Output: "Hello<br />World"

Security Features:

Escapes all HTML except <br /> tags
Prevents XSS while preserving email formatting
Users can type HTML-like text (e.g., “The <button> component”) and it displays literally

Example:

import { sanitizeForEmail } from './sanitization';

// Sanitize user feedback message
const sanitizedMessage = sanitizeForEmail(userFeedbackText);

// Use in email template (Pug syntax with != for unescaped output)
// feedback.pug: p!= feedbackMessage

The email sanitization wrapper at services/backend/src/utils/emailSanitization.ts provides backwards compatibility with existing code.

sanitizeMarkdown()

Purpose: Sanitize GitHub README and markdown content for safe display with rich formatting. Function Signature:

sanitizeMarkdown(content: string): {
  content: string;
  originalLength: number;
  sanitizedLength: number;
  removalBytes: number;
  removalPercentage: number;
}

Allowed Tags (38+ tags):

Text formatting: p, br, strong, em, u, s, del, ins
Headings: h1, h2, h3, h4, h5, h6
Lists: ul, ol, li, dl, dt, dd
Links and code: a, code, pre, blockquote
Images: img, picture, source
Tables: table, thead, tbody, tfoot, tr, td, th
Containers: div, span, section, article
GitHub-specific: details, summary
Other: hr, sup, sub

Security Features:

Blocks <script>, <iframe>, <object>, <embed> tags
Prevents javascript: URLs in links
Removes event handlers (onclick, onerror, onload, etc.)
Allows data URIs for images only
Allows safe protocols: http, https, mailto, tel

Removal Tracking: The function returns statistics about content removal. If >10% of content is removed, it indicates possible malicious content that should be logged. Example:

import { sanitizeMarkdown } from '../utils/sanitization';

const sanitizationResult = sanitizeMarkdown(decodedContent);

// Log security warning if significant removal
if (sanitizationResult.removalPercentage > 10) {
  logger.warn({
    removal_percentage: sanitizationResult.removalPercentage.toFixed(2),
    removal_bytes: sanitizationResult.removalBytes,
    repository_url: repositoryUrl
  }, 'High sanitization removal rate detected');
}

return {
  content: sanitizationResult.content,
  encoding: 'utf8'
};

When to Use Which Function

Plain text only (no formatting needed)? → Use sanitizeText()Use for:

Error messages
OAuth callbacks
HTML attributes
Any text that should display literally

Email content with newlines? → Use sanitizeForEmail()Use for:

User feedback messages
Email notifications
Text with line breaks that need to render as <br /> tags

Rich markdown/HTML content? → Use sanitizeMarkdown()Use for:

GitHub READMEs
Blog posts
Documentation
Any content with headings, links, images, tables

XSS Prevention

All three functions prevent common XSS attack vectors: Test Vectors Blocked:

<script>alert(1)</script> → Removed or escaped
<img src=x onerror=alert(1)> → onerror attribute removed
<a href="javascript:alert(1)">click</a> → javascript: protocol blocked
<iframe src="javascript:alert(1)"> → iframe tag removed
<svg onload=alert(1)> → onload attribute removed
<body onload=alert(1)> → body tag removed
DOM clobbering attempts → Neutralized

Never use unsanitized user input directly in HTML context. Always use one of the three sanitization functions before rendering user content.

XSS vulnerabilities can lead to:

Account compromise through session theft
Session hijacking
Malicious code execution in user browsers
Sensitive data theft
Phishing attacks

Security Features

Whitelist Approach

Only explicitly allowed tags and attributes pass through
Everything else is escaped or removed
Safer than blacklist approach (which tries to block known bad patterns)

Protocol Validation

Only safe URL protocols allowed: http, https, mailto, tel
Blocks dangerous protocols: javascript:, data: (except for images), file:
Prevents protocol-relative URLs in strict mode

Event Handler Removal

All JavaScript event handlers stripped automatically
Blocked handlers: onclick, onerror, onload, onfocus, onmouseover, etc.
Prevents inline JavaScript execution

Server-Side Library

No DOM emulation (unlike the previous isomorphic-dompurify)
No CSS parsing dependencies
Faster and lighter with better performance
Designed specifically for server-side Node.js environments

Automatic Escaping

Text content automatically escaped by sanitize-html
No need for manual HTML entity encoding in most cases
Reduces risk of escaping bugs

Implementation Files

Source Files:

services/backend/src/utils/sanitization.ts - Centralized sanitization utility
services/backend/src/utils/emailSanitization.ts - Email wrapper for backwards compatibility
services/backend/src/services/githubReadmeService.ts - README sanitization implementation
services/backend/src/routes/mcp/installations/callback.ts - OAuth error sanitization

Test Files:

tests/unit/utils/sanitization.test.ts - 44 comprehensive tests

Testing

The sanitization utility has comprehensive test coverage: Test Categories:

HTML entity escaping (6 tests)
Email formatting with newlines (10 tests)
Markdown/HTML tag handling (28 tests)
XSS prevention suite (multiple attack vectors tested)

All 44 tests pass ✓ Run Tests:

cd services/backend
npm run test:unit -- sanitization.test.ts

When adding new sanitization use cases, write tests first to ensure proper XSS prevention before implementation.

Migration from isomorphic-dompurify

The backend migrated from isomorphic-dompurify to sanitize-html for better performance and reliability. Migration Benefits:

Removed ~20MB of dependencies (jsdom, CSS parsers)
Fixed module errors on remote servers (@csstools/css-parser-algorithms missing)
Eliminated 2 duplicate escapeHtml() functions across the codebase
Centralized all sanitization in single utility
Better performance (no DOM emulation overhead)
Equivalent security with explicit whitelist approach

Key Difference:

sanitize-html outputs self-closing tags: <br /> instead of <br>
All tests and expectations updated to match new format

Security Policy - Overall backend security guidelines including password hashing, session management, and encryption
API Security - API-specific security patterns including authorization hooks and RBAC
Mail System - Email sending system that uses sanitizeForEmail()

Basics

Advanced

API

Database

Auth

Satellite

Overview

Centralized Sanitization Utility

The Three Sanitization Functions

sanitizeText()

sanitizeForEmail()

sanitizeMarkdown()

When to Use Which Function

XSS Prevention

Security Features

Whitelist Approach

Protocol Validation

Event Handler Removal

Server-Side Library

Automatic Escaping

Implementation Files

Testing

Migration from isomorphic-dompurify

Basics

Advanced

API

Database

Auth

Satellite

​Overview

​Centralized Sanitization Utility

​The Three Sanitization Functions

​sanitizeText()

​sanitizeForEmail()

​sanitizeMarkdown()

​When to Use Which Function

​XSS Prevention

​Security Features

​Whitelist Approach

​Protocol Validation

​Event Handler Removal

​Server-Side Library

​Automatic Escaping

​Implementation Files

​Testing

​Migration from isomorphic-dompurify

​Related Documentation

Overview

Centralized Sanitization Utility

The Three Sanitization Functions

sanitizeText()

sanitizeForEmail()

sanitizeMarkdown()

When to Use Which Function

XSS Prevention

Security Features

Whitelist Approach

Protocol Validation

Event Handler Removal

Server-Side Library

Automatic Escaping

Implementation Files

Testing

Migration from isomorphic-dompurify

Related Documentation