-
Notifications
You must be signed in to change notification settings - Fork 9
ANSI escape sequences in logs interfere with regex filtering #873
Description
Description
When container logs contain ANSI escape sequences (control characters for colors, cursor movements, etc.), regex filtering doesn't work as expected because the invisible control characters interfere with pattern matching.
This issue was discovered while investigating log filtering with netshoot containers, which often include shell output with ANSI escape sequences.
Current Behavior
For example, if a log line contains:
\x1b[31mERROR\x1b[0m: Connection failed
- Visually displayed as:
ERROR: Connection failed(with red color) - Actual string contains:
\x1b[31mERROR\x1b[0m: Connection failed - Regex
^ERRORmatch: ❌ Fails (line actually starts with\x1b) - User expectation: ✅ Should match (user sees "ERROR" at the start)
Expected Behavior
Users should be able to filter logs based on what they visually see, not based on invisible control characters.
Examples of Affected Patterns
- Line-start matching:
^ERRORdoesn't match lines that appear to start with "ERROR" - Line-end matching:
failed$doesn't match lines that appear to end with "failed" - Empty line filtering:
^$doesn't match visually empty lines that contain only control characters
Impact
- Affected containers: Debug containers (netshoot), interactive shells, applications with colored output
- User experience: Users need to know about invisible control characters to write effective filters
- Workaround difficulty: Currently no workaround available
Technical Background
This project already has an ANSI escape sequence parser (src/ansi/parser.rs) that handles:
- CSI sequences (colors, cursor movements, etc.)
- Graphics rendering (SGR)
- Cursor control
However, the parser doesn't cover all possible control sequences (OSC, DCS, terminal-specific sequences, etc.).
Proposed Solution
Maintain two versions of log content:
- Raw content: For display (preserves colors and formatting)
- Plain content: For filtering (all ANSI escape sequences removed)
Example implementation approach:
pub struct FilterableLogContent {
pub raw: String, // For display
pub plain: String, // For filtering
}Implementation Considerations
-
Scope of stripping:
- CSI sequences:
\x1b[...(already parsed) - OSC sequences:
\x1b]...(not yet covered) - Other control characters: C0/C1 control codes
- CSI sequences:
-
Stripping method:
- Option A: Regex-based (simple, covers 95-99% of cases)
- Option B: Parser-based (accurate but requires extending existing parser)
- Option C: Hybrid (recommended)
-
Performance impact: Need to strip ANSI codes for every log line
-
Memory impact: Storing both raw and plain versions doubles string storage
-
Compatibility: Ensure existing log display functionality isn't affected
Reference
- ANSI escape sequences: ECMA-48 (ISO/IEC 6429)
- Existing parser:
src/ansi/parser.rs - Related PR: fix: remove leading space from log content after timestamp parsing #872 (fixed leading space issue)
Additional Notes
This is a separate issue from #871 (leading space after timestamp). While both affect regex filtering, they have different root causes and solutions.
This issue is marked for future consideration and is not blocking current functionality.