Overview
HTML sanitization is a critical security measure that protects DeployStack from Cross-Site Scripting (XSS) attacks by ensuring user-provided content is safe to render in HTML contexts. The backend uses a centralized three-function approach powered by thesanitize-html library.
The sanitization system provides:
- XSS Prevention: Blocks malicious scripts, event handlers, and dangerous protocols
- Content Preservation: Maintains legitimate formatting while removing threats
- Removal Tracking: Monitors sanitization impact for security insights
- Single Source of Truth: All sanitization logic centralized in one utility
Centralized Sanitization Utility
All HTML sanitization functions are centralized in services/backend/src/utils/sanitization.ts. Benefits:- No duplicate sanitization code across the codebase
- Consistent security behavior application-wide
- Easier to audit and maintain
- Comprehensive test coverage (44 tests)
The backend recently migrated from
isomorphic-dompurify to sanitize-html, eliminating ~20MB of dependencies and fixing module errors on remote servers while maintaining equivalent security.The Three Sanitization Functions
sanitizeText()
Purpose: Escape all HTML entities for plain text rendering. Function Signature:- Error messages in OAuth callbacks
- User input displayed in HTML attributes
- Any text that must display literally (not as HTML)
<script>→<script>- Escapes:
&,<,>,",'
sanitizeForEmail()
Purpose: Sanitize user-provided text for email rendering with newline preservation. Function Signature:<br /> tags for line breaks.
Defense-in-Depth Approach:
- Normalize line endings (
\r\n→\n) - Trim whitespace
- Escape HTML entities (primary defense)
- Convert newlines to
<br />tags - Sanitize with sanitize-html (secondary defense)
- Escapes all HTML except
<br />tags - Prevents XSS while preserving email formatting
- Users can type HTML-like text (e.g., “The
<button>component”) and it displays literally
The email sanitization wrapper at services/backend/src/utils/emailSanitization.ts provides backwards compatibility with existing code.
sanitizeMarkdown()
Purpose: Sanitize GitHub README and markdown content for safe display with rich formatting. Function Signature:- Text formatting:
p,br,strong,em,u,s,del,ins - Headings:
h1,h2,h3,h4,h5,h6 - Lists:
ul,ol,li,dl,dt,dd - Links and code:
a,code,pre,blockquote - Images:
img,picture,source - Tables:
table,thead,tbody,tfoot,tr,td,th - Containers:
div,span,section,article - GitHub-specific:
details,summary - Other:
hr,sup,sub
- Blocks
<script>,<iframe>,<object>,<embed>tags - Prevents
javascript:URLs in links - Removes event handlers (
onclick,onerror,onload, etc.) - Allows data URIs for images only
- Allows safe protocols:
http,https,mailto,tel
When to Use Which Function
Plain text only (no formatting needed)? → Use
sanitizeText()Use for:- Error messages
- OAuth callbacks
- HTML attributes
- Any text that should display literally
Email content with newlines? → Use
sanitizeForEmail()Use for:- User feedback messages
- Email notifications
- Text with line breaks that need to render as
<br />tags
Rich markdown/HTML content? → Use
sanitizeMarkdown()Use for:- GitHub READMEs
- Blog posts
- Documentation
- Any content with headings, links, images, tables
XSS Prevention
All three functions prevent common XSS attack vectors: Test Vectors Blocked:<script>alert(1)</script>→ Removed or escaped<img src=x onerror=alert(1)>→onerrorattribute removed<a href="javascript:alert(1)">click</a>→javascript:protocol blocked<iframe src="javascript:alert(1)">→iframetag removed<svg onload=alert(1)>→onloadattribute removed<body onload=alert(1)>→bodytag removed- DOM clobbering attempts → Neutralized
XSS vulnerabilities can lead to:
- Account compromise through session theft
- Session hijacking
- Malicious code execution in user browsers
- Sensitive data theft
- Phishing attacks
Security Features
Whitelist Approach
- Only explicitly allowed tags and attributes pass through
- Everything else is escaped or removed
- Safer than blacklist approach (which tries to block known bad patterns)
Protocol Validation
- Only safe URL protocols allowed:
http,https,mailto,tel - Blocks dangerous protocols:
javascript:,data:(except for images),file: - Prevents protocol-relative URLs in strict mode
Event Handler Removal
- All JavaScript event handlers stripped automatically
- Blocked handlers:
onclick,onerror,onload,onfocus,onmouseover, etc. - Prevents inline JavaScript execution
Server-Side Library
- No DOM emulation (unlike the previous
isomorphic-dompurify) - No CSS parsing dependencies
- Faster and lighter with better performance
- Designed specifically for server-side Node.js environments
Automatic Escaping
- Text content automatically escaped by sanitize-html
- No need for manual HTML entity encoding in most cases
- Reduces risk of escaping bugs
Implementation Files
Source Files:- services/backend/src/utils/sanitization.ts - Centralized sanitization utility
- services/backend/src/utils/emailSanitization.ts - Email wrapper for backwards compatibility
- services/backend/src/services/githubReadmeService.ts - README sanitization implementation
- services/backend/src/routes/mcp/installations/callback.ts - OAuth error sanitization
- tests/unit/utils/sanitization.test.ts - 44 comprehensive tests
Testing
The sanitization utility has comprehensive test coverage: Test Categories:- HTML entity escaping (6 tests)
- Email formatting with newlines (10 tests)
- Markdown/HTML tag handling (28 tests)
- XSS prevention suite (multiple attack vectors tested)
Migration from isomorphic-dompurify
The backend migrated fromisomorphic-dompurify to sanitize-html for better performance and reliability.
Migration Benefits:
- Removed ~20MB of dependencies (jsdom, CSS parsers)
- Fixed module errors on remote servers (
@csstools/css-parser-algorithmsmissing) - Eliminated 2 duplicate
escapeHtml()functions across the codebase - Centralized all sanitization in single utility
- Better performance (no DOM emulation overhead)
- Equivalent security with explicit whitelist approach
sanitize-htmloutputs self-closing tags:<br />instead of<br>- All tests and expectations updated to match new format
Related Documentation
- Security Policy - Overall backend security guidelines including password hashing, session management, and encryption
- API Security - API-specific security patterns including authorization hooks and RBAC
- Mail System - Email sending system that uses
sanitizeForEmail()

