Skip to content

fix(pii_filter): add comprehensive Rust implementation hardening and regression tests#3840

Merged
brian-hussey merged 17 commits intomainfrom
fix/pii-filter-regression-tests
Mar 27, 2026
Merged

fix(pii_filter): add comprehensive Rust implementation hardening and regression tests#3840
brian-hussey merged 17 commits intomainfrom
fix/pii-filter-regression-tests

Conversation

@lucarlig
Copy link
Copy Markdown
Collaborator

@lucarlig lucarlig commented Mar 24, 2026

📌 Summary

This PR hardens the Rust PII filter implementation with comprehensive validation, error handling, detection improvements, and regression test coverage to ensure robust security behavior. It also tightens loopback passthrough header filtering so internal loopback requests do not forward hop-by-hop, routing, or MCP session headers from the inbound client request.

🔁 Reproduction Steps

Issues were identified through internal security review and testing that revealed gaps in validation, detection patterns, error handling, and loopback header filtering.

🐞 Root Cause

The Rust PII filter implementation needed hardening in several areas:

  1. Mask strategy handling: Strategies and nested keys required proper preservation
  2. Detection patterns: Patterns needed expansion for better coverage
  3. Input validation: Missing validation and error handling in core logic
  4. Configuration limits: Resource limits needed proper bounds and validation
  5. Test coverage: Missing regression tests for edge cases
  6. Loopback passthrough filtering: Internal loopback requests still allowed transport and routing headers that should be regenerated or blocked by the gateway

💡 Fix Description

Implemented comprehensive improvements across the branch:

  1. Mask strategy preservation and nested key support (16a9930c0)

    • Added proper handling of mask strategies across detection types
    • Implemented nested JSON key support
    • Added tests for strategy preservation
  2. Comprehensive detection patterns and protection (b6860120d)

    • Expanded PII detection patterns for better coverage
    • Added pattern complexity validation
    • Implemented protection against pathological regex behavior
    • Added 400+ lines of detection logic
  3. Input validation and error handling (56e85f82e)

    • Added validation for masking inputs
    • Implemented proper error messages and handling
    • Added 90+ lines of validation logic
  4. Resource limits and config validation (e886de2a6)

    • Added configuration validation with proper bounds
    • Implemented resource limit enforcement
    • Added config validation tests
  5. Comprehensive test coverage (415ba377f)

    • Added 90+ lines of test coverage for edge cases
    • Implemented regression tests for detection gaps
    • Added error path testing
  6. Documentation (e77106295)

    • Documented protection limits and constraints
    • Added upper bounds to resource limit fields
    • Updated README with security considerations
  7. Loopback passthrough header hardening

    • Expanded the loopback skip list to drop hop-by-hop and routing headers such as Connection, Transfer-Encoding, TE, Trailer, Upgrade, Host, and Content-Length
    • Prevented inbound clients from influencing internal loopback request framing, routing, or MCP session propagation
    • Added deny-path coverage for the filtered header set

🧪 Verification

Check Command Status
Lint suite make lint
Unit tests make test
Coverage ≥ 80 % make coverage
Rust tests cargo test
Manual regression no longer fails Verified all edge cases

📐 MCP Compliance (if relevant)

  • Matches current MCP spec
  • No breaking change to MCP clients

✅ Checklist

  • Code formatted (make black isort pre-commit)
  • No secrets/credentials committed
  • Tests added for all changes
  • Documentation updated

@lucarlig lucarlig requested review from araujof and terylt as code owners March 24, 2026 14:49
@lucarlig lucarlig added the bug Something isn't working label Mar 24, 2026
@lucarlig lucarlig requested a review from jonpspri as a code owner March 24, 2026 14:49
@lucarlig lucarlig added security Improves security rust Rust programming labels Mar 24, 2026
@lucarlig lucarlig requested a review from dima-zakharov as a code owner March 24, 2026 14:49
@lucarlig lucarlig requested a review from crivetimihai as a code owner March 24, 2026 14:49
@lucarlig lucarlig force-pushed the fix/pii-filter-regression-tests branch 3 times, most recently from 3af63ae to b4b958b Compare March 24, 2026 15:34
Comment thread plugins_rust/pii_filter/src/config.rs Outdated
@sco3
Copy link
Copy Markdown

sco3 commented Mar 25, 2026

I approve and put some details here for a record, findings are of low priority.

Branch Review Findings

Branch: fix/pii-filter-regression-tests
Compared to: main
Review Date: March 25, 2026


Executive Summary

Metric Value
Files Changed 13
Insertions 1,696 lines
Deletions 129 lines
Commits 10
Critical Issues 0
Medium Issues 1
Low Issues 4
Positive Findings 6

Verdict: APPROVE with minor fixes


Branch Overview

This branch focuses on three main areas:

  1. Loopback Passthrough Header Hardening (security deny-path)
  2. Rust PII Filter Improvements (SSN validation, resource limits, ReDoS protection)
  3. Test Coverage for edge cases and regression testing

Files Changed

File Changes Purpose
mcpgateway/utils/passthrough_headers.py +9 Block hop-by-hop headers
plugins/pii_filter/pii_filter.py +5 Add resource limit config
plugins/pii_filter/pii_filter_rust.py -19 Simplify import logic
plugins/pii_filter/README.md +26 Document SSN validation
plugins_rust/pii_filter/README.md +149 Document detection coverage
plugins_rust/pii_filter/src/config.rs +104 Enforce resource limits
plugins_rust/pii_filter/src/detector.rs +900 SSN validation, error handling
plugins_rust/pii_filter/src/masking.rs +132 Range validation, UTF-8 safety
plugins_rust/pii_filter/src/patterns.rs +226 ReDoS protection, contextual matching
plugins_rust/pii_filter/benches/pii_filter.rs +1 Fix benchmark config
plugins_rust/pii_filter/python/pii_filter_rust/__init__.pyi -1 Cleanup
tests/unit/mcpgateway/plugins/plugins/pii_filter/test_pii_filter.py +226 Edge case coverage
tests/unit/mcpgateway/test_loopback_passthrough_headers.py +17 Deny-path tests

Issues by Severity

🔴 CRITICAL: None

No show-stopping bugs or security vulnerabilities found.


🟡 MEDIUM: 1 Issue

1. Custom Pattern ReDoS Validation Incomplete

File: plugins_rust/pii_filter/src/patterns.rs (lines 324-362)
Function: validate_custom_pattern()

Problem: The validation counts total quantifiers but doesn't detect nested quantifiers, which are the primary cause of Regular Expression Denial of Service (ReDoS) attacks.

Current Validation:

let quantifiers = pattern.chars()
    .filter(|ch| matches!(ch, '*' | '+' | '?'))
    .count()
    + pattern.matches('{').count();
if quantifiers > MAX_QUANTIFIERS {  // 24
    return Err("too many quantifiers");
}

Dangerous Patterns That Pass Current Validation:

Pattern Quantifiers Nesting Risk
(a+)+ 2 2 🔴 ReDoS
((a+)+)+ 3 3 🔴 ReDoS
(\w+\s?)+ 3 2 🔴 ReDoS
([A-Za-z0-9]+[-_]?)+ 3 2 🔴 ReDoS
a+b+c+d+ 4 0 ✅ Safe

Impact: A malicious or erroneous custom pattern can cause catastrophic backtracking, consuming CPU for seconds or minutes on small inputs.

Example Attack:

# This pattern passes current validation
pattern = r"(a+)+"

# Input: 30 'a' characters + '!'
text = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!"

# Processing time: ~1.2 seconds (exponential growth)
# Each additional character doubles the time

Fix Option A: Add Nested Quantifier Validation (Recommended):

/// Вычисляет максимальную глубину вложенности квантификаторов
fn calculate_quantifier_nesting_depth(pattern: &str) -> usize {
    let mut group_depth = 0;
    let mut max_quantifier_depth = 0;
    let mut in_char_class = false;
    let mut escaped = false;

    for ch in pattern.chars() {
        if escaped {
            escaped = false;
            continue;
        }
        if ch == '\\' {
            escaped = true;
            continue;
        }
        if ch == '[' && !in_char_class {
            in_char_class = true;
            continue;
        }
        if ch == ']' && in_char_class {
            in_char_class = false;
            continue;
        }
        if in_char_class {
            continue;
        }

        match ch {
            '(' => group_depth += 1,
            ')' => group_depth = group_depth.saturating_sub(1),
            '*' | '+' | '?' => {
                if group_depth > 0 {
                    max_quantifier_depth = max_quantifier_depth.max(group_depth);
                }
            }
            '{' => {
                if group_depth > 0 {
                    max_quantifier_depth = max_quantifier_depth.max(group_depth);
                }
            }
            _ => {}
        }
    }

    max_quantifier_depth
}

// Add to validate_custom_pattern:
const MAX_NESTING_DEPTH: usize = 2;

let nesting = calculate_quantifier_nesting_depth(pattern);
if nesting > MAX_NESTING_DEPTH {
    return Err(format!(
        "Pattern has nested quantifiers (depth: {}, max: {}) - potential ReDoS",
        nesting, MAX_NESTING_DEPTH
    ));
}

Fix Option B: Document Trusted-Input Assumption (Alternative):

If custom patterns are only added by trusted admins (not end users), document this assumption:

// SECURITY NOTE: Custom patterns are trusted input (added by admins only).
// This validation provides basic safeguards against typos and obvious errors.
// For untrusted pattern sources, use regex-automata DFA engine instead.

Tests to Add:

#[test]
fn test_rejects_nested_quantifiers() {
    assert!(validate_custom_pattern("(a+)+").is_err());
    assert!(validate_custom_pattern("((a+)+)+").is_err());
    assert!(validate_custom_pattern(r"(\w+\s?)+").is_err());
}

#[test]
fn test_accepts_flat_quantifiers() {
    assert!(validate_custom_pattern("a+b+c+").is_ok());
    assert!(validate_custom_pattern("EMP-[0-9]{6}").is_ok());
}

Effort: ~60 lines of code
Priority: Fix before merge OR document trusted-input assumption


🟢 LOW: 4 Issues

1. UTF-8 Boundary Error Message Lacks Diagnostic Info

File: plugins_rust/pii_filter/src/masking.rs (lines 58-87)
Function: validate_detection_ranges()

Current Code:

if !text.is_char_boundary(detection.start) || !text.is_char_boundary(detection.end) {
    return Err("Invalid detection range: offsets must align to UTF-8 boundaries".to_string());
}

Problem: Error message doesn't include which detection failed or what the actual byte offsets were, making debugging difficult.

Fix:

if !text.is_char_boundary(detection.start) || !text.is_char_boundary(detection.end) {
    return Err(format!(
        "Invalid detection range: offsets {}..{} must align to UTF-8 boundaries (text len: {})",
        detection.start, detection.end, text.len()
    ));
}

Effort: ~5 lines
Priority: Recommended


2. Hash Output Length Change Undocumented

File: plugins_rust/pii_filter/src/masking.rs (line 218)

Change:

// Before: format!("[HASH:{}]", &format!("{:x}", result)[..8])
// After:  format!("[HASH:{}]", &format!("{:x}", result)[..16])

Impact:

  • Before: [HASH:abcd1234] (15 characters)
  • After: [HASH:abcd1234efgh5678] (23 characters)

Downstream systems parsing masked output may break if they expect fixed-width fields.

Fix: Add to plugins_rust/pii_filter/README.md or CHANGELOG.md:

## Changelog

### [Unreleased]

#### Breaking Changes
- **Hash mask output length increased**: Hash strategy now produces 16-character 
  hex output instead of 8 characters for improved collision resistance.
  - Before: `[HASH:abcd1234]`
  - After: `[HASH:abcd1234efgh5678]`
  - Migration: Update any regex parsers or fixed-width field extractors

Effort: ~10 lines documentation
Priority: Document before merge


3. Performance Test Generates Invalid SSNs

File: tests/unit/mcpgateway/plugins/plugins/pii_filter/test_pii_filter.py (lines 914-918)

Current Code:

for i in range(10000):
    area = (i % 799) + 100
    if area == 666:  # Only skips exactly 666
        area = 667
    lines.append(f"User {i}: SSN {area:03d}-45-6789, Email user{i}@example.com")

Problem: Doesn't skip all invalid SSN area codes per SSA rules:

  • 000 — invalid
  • 666 — invalid (only one skipped)
  • 900-999 — invalid (not skipped)

Fix:

def is_valid_ssn_area(area: int) -> bool:
    """Check if SSN area code is structurally valid per SSA rules."""
    return area != 0 and area != 666 and area < 900

lines = []
i = 0
while len(lines) < 10000:
    area = (i % 800) + 100  # Range 100-899
    if is_valid_ssn_area(area):
        lines.append(f"User {len(lines)}: SSN {area:03d}-45-6789, Email user{len(lines)}@example.com")
    i += 1

Effort: ~15 lines
Priority: Recommended for test accuracy


4. Cumulative Text Size Not Tracked in Nested Structures (Optional)

File: plugins_rust/pii_filter/src/detector.rs (lines 216-350)
Function: process_nested_internal()

Current Behavior: Each string is validated individually against max_text_bytes, but cumulative size across many strings is not tracked.

Attack Scenario:

# Each string is 1KB (passes individual check)
data = {"field_" + str(i): "x" * 1024 for i in range(2000)}
# Total: 2MB (may exceed intended memory budget)

Fix (Optional, More Invasive):

fn process_nested_internal(
    &self,
    py: Python,
    data: &Bound<'_, PyAny>,
    path: &str,
    depth: usize,
    cumulative_size: &mut usize,  // NEW parameter
) -> PyResult<(bool, Py<PyAny>, Py<PyAny>)> {
    // ...

    if let Ok(text) = data.extract::<String>() {
        *cumulative_size += text.len();
        if *cumulative_size > self.config.max_text_bytes {
            return Err(PyErr::new::<pyo3::exceptions::PyValueError, _>(
                "Cumulative text size exceeds maximum limit"
            ));
        }
        // ...
    }

    // Recursive calls pass cumulative_size
    let (val_modified, new_value, val_detections) =
        self.process_nested_internal(py, &value, &new_path, depth + 1, cumulative_size)?;
}

Why Optional: This is defense-in-depth. The existing per-string check catches most attacks. Only needed if memory exhaustion via many small strings is a documented threat model.

Effort: ~50 lines
Priority: Optional hardening


Positive Findings

✅ 1. Excellent SSN Validation

File: plugins_rust/pii_filter/src/detector.rs (lines 1133-1146)

The Rust detector correctly implements SSA Publication No. 05-10033 rules:

fn is_valid_ssn(value: &str) -> bool {
    let digits: String = value.chars().filter(|c| c.is_ascii_digit()).collect();
    if digits.len() != 9 {
        return false;
    }

    let area = &digits[0..3];
    let group = &digits[3..5];
    let serial = &digits[5..9];

    area != "000" && area != "666" && area < "900" && group != "00" && serial != "0000"
}

Validates:

  • ✅ Area code cannot be 000
  • ✅ Area code cannot be 666
  • ✅ Area code cannot be 900-999
  • ✅ Group number cannot be 00
  • ✅ Serial number cannot be 0000

Impact: Reduces false positives on random 9-digit numbers.


✅ 2. Strong Credit Card Validation

File: plugins_rust/pii_filter/src/detector.rs (lines 1148-1190)

Implements proper Luhn algorithm + card prefix validation:

fn passes_luhn(value: &str) -> bool {
    // ✓ Luhn checksum validation
    // ✓ Length check (13-19 digits)
    // ✓ Card prefix validation (Visa, MC, Amex, etc.)
}

Validates:

  • ✅ Luhn checksum
  • ✅ Card length (13-19 digits)
  • ✅ Known card prefixes (Visa 4, MC 51-55, Amex 34/37, etc.)

Impact: Prevents false positives on random 16-digit numbers.


✅ 3. Contextual PII Detection

File: plugins_rust/pii_filter/src/patterns.rs (lines 36-175)

Built-in patterns require explicit context labels for ambiguous identifiers:

PII Type Required Context Example Match Example Non-Match
SSN "SSN", "Social Security" SSN: 123-45-6789 Order: 123-45-6789
BSN "BSN", "Citizen Service Number" My BSN is 123456789 Invoice: 123456789
Passport "Passport", "Passport No" Passport: AB123456 ID: AB123456
Bank Account "Account", "Bank Account" Account: 123456789 Reference: 123456789

Impact: Significantly reduces false positives on generic identifiers.


✅ 4. Comprehensive Loopback Header Filtering

File: mcpgateway/utils/passthrough_headers.py (lines 552-568)

Blocks all HTTP/1.1 hop-by-hop and routing headers:

_LOOPBACK_SKIP_HEADERS: frozenset[str] = frozenset({
    "authorization",
    "connection",
    "content-type",
    "content-length",
    "host",
    "keep-alive",
    "mcp-session-id",
    "proxy-connection",
    "te",
    "trailer",
    "transfer-encoding",
    "upgrade",
    "x-mcp-session-id",
    "x-forwarded-internally",
})

Blocks:

  • ✅ Authentication headers (Authorization)
  • ✅ Hop-by-hop headers (Connection, Keep-Alive, TE, Trailer, Transfer-Encoding, Upgrade)
  • ✅ Routing headers (Host, Content-Length, Content-Type)
  • ✅ MCP-specific headers (MCP-Session-ID, X-MCP-Session-ID, X-Forwarded-Internally)

Impact: Prevents header injection attacks in loopback scenarios.


✅ 5. Resource Limit Enforcement

File: plugins_rust/pii_filter/src/config.rs (lines 220-252)

Validates and enforces safe resource limits:

pub max_text_bytes: usize,      // Default: 10MB, Max: 100MB
pub max_nested_depth: usize,    // Default: 32, Max: 1000
pub max_collection_items: usize // Default: 4096, Max: 1,000,000

Validates:

  • ✅ Limits are within safe bounds
  • ✅ Rejects zero or negative limits
  • ✅ Enforced at every entry point (detect, mask, process_nested)

Impact: Prevents DoS via oversized inputs or deeply nested structures.


✅ 6. Detection Range Validation

File: plugins_rust/pii_filter/src/masking.rs (lines 58-87)

Comprehensive validation of detection ranges before masking:

fn validate_detection_ranges(text: &str, detections: &[Detection]) -> Result<(), String> {
    for detection in detections {
        // ✓ start <= end
        // ✓ end <= text length
        // ✓ UTF-8 character boundaries
        // ✓ Overlapping range detection
    }
}

Impact: Prevents panics and memory safety issues during masking.


Recommended Priority

Required Before Merge

  1. [MEDIUM] Add nested quantifier check in patterns.rs
    OR document that custom patterns are trusted-input only

Recommended Before Merge

  1. [LOW] Document hash length change in changelog

Nice to Have

  1. [LOW] Improve UTF-8 error message in masking.rs
  2. [LOW] Fix test SSN generation to skip invalid ranges
  3. [LOW] Add cumulative text size tracking (optional)

Testing Summary

Test Coverage Added

Test File New Tests Coverage
test_pii_filter.py +226 lines SSN edge cases, mask strategies, performance
test_loopback_passthrough_headers.py +17 lines Deny-path regression tests

Key Test Scenarios Covered

  • ✅ Structurally impossible SSNs (000, 666, 900-999 area codes)
  • ✅ BSN vs other 9-digit numbers
  • ✅ Mask strategy regression (partial vs redact vs hash)
  • ✅ AWS key detection edge cases
  • ✅ Nested structure processing
  • ✅ Large batch detection performance
  • ✅ Loopback header filtering deny-paths

Security Assessment

Attack Vectors Addressed

Vector Status Mitigation
Header injection (loopback) ✅ Mitigated Comprehensive header filtering
ReDoS (custom patterns) ⚠️ Partial Length/quantifier limits, missing nesting check
DoS (oversized inputs) ✅ Mitigated Text size, depth, collection limits
DoS (deep nesting) ✅ Mitigated Max depth validation
False positives (SSN) ✅ Mitigated SSA structural validation
False positives (generic IDs) ✅ Mitigated Contextual detection

Remaining Concerns

  1. Custom Pattern ReDoS (MEDIUM): Nested quantifiers not detected
    • Mitigation: Add nesting depth validation OR document trusted-input assumption

Conclusion

This branch demonstrates strong security engineering with:

  • ✅ Defense-in-depth header filtering
  • ✅ Robust input validation
  • ✅ Comprehensive test coverage
  • ✅ Proper error handling
  • ✅ SSA-compliant SSN validation
  • ✅ Contextual PII detection to reduce false positives

Overall Verdict: APPROVE with minor fixes

Required Action: Address the nested quantifier validation gap (Issue #1) before merging, or explicitly document that custom patterns are trusted-input only.


dima-zakharov
dima-zakharov previously approved these changes Mar 25, 2026
Copy link
Copy Markdown
Collaborator

@dima-zakharov dima-zakharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@lucarlig
Copy link
Copy Markdown
Collaborator Author

lucarlig commented Mar 25, 2026

Addressed the review follow-ups on the branch.

What changed:

  • Documented the custom-pattern trust boundary in plugins_rust/pii_filter/src/patterns.rs and both PII filter READMEs instead of adding a nested-quantifier rejector. The branch uses Rust regex, so matching stays linear-time, and the remaining limits are now described as guardrails for trusted admin-authored patterns.
  • Improved the UTF-8 boundary error in plugins_rust/pii_filter/src/masking.rs to include the failing byte offsets and text length, and added a Rust unit test for that path.
  • Documented the 16-hex-character HASH mask output in the Rust and plugin READMEs so downstream parsers have an explicit migration note.
  • Fixed the large-batch Python performance test so the generated SSNs skip invalid SSA area codes instead of only patching 666.

Verification:

  • cargo test masking::tests --manifest-path plugins_rust/pii_filter/Cargo.toml
  • cargo test patterns::tests --manifest-path plugins_rust/pii_filter/Cargo.toml
  • cargo fmt --manifest-path plugins_rust/pii_filter/Cargo.toml --check
  • uv run maturin develop --release --manifest-path plugins_rust/pii_filter/Cargo.toml
  • uv run pytest tests/unit/mcpgateway/plugins/plugins/pii_filter/test_pii_filter.py::TestRustPIIDetectorSpecific::test_large_batch_detection -q

I did not add the optional cumulative nested-text budget tracking in this PR. That one is more invasive, and the existing per-string size/depth/collection limits remain unchanged.

source:
https://docs.rs/regex/latest/regex/#untrusted-input

@dima-zakharov
Copy link
Copy Markdown
Collaborator

dima-zakharov commented Mar 25, 2026

I gave page content to cline with IBM sonnet 4.5 model and this is the result of analysis:

CRITICAL CORRECTION: ReDoS Analysis for Rust Regex Crate

Summary: Traditional ReDoS Does NOT Apply to Rust's regex Crate

After reviewing the official Rust regex documentation, the ReDoS concern about nested quantifiers (Issue #2 in my review) is based on a
MISUNDERSTANDING of how the Rust regex engine works.

Key Facts from Rust regex Documentation:

  1. "This crate is meant to be able to run regex searches on untrusted haystacks without fear of ReDoS."

  2. "This crate differs from most (but not all) other regex engines in that it doesn't use unbounded backtracking to run a regex search."

  3. *Guaranteed worst case O(m n) time complexity where:

    • m = size of regex (after expansion)
    • n = length of haystack
  4. Uses finite automata engines:

    • Thompson NFA
    • Lazy DFA
    • One-pass DFA
    • Bounded backtracker (not unbounded!)
    • PikeVM
  5. No catastrophic backtracking: The regex crate explicitly does NOT suffer from catastrophic backtracking that causes traditional ReDoS attacks.

What This Means:

❌ BUSTED: Traditional ReDoS Concerns

Patterns like (a+)+ or (a*)* that cause catastrophic backtracking in PCRE, JavaScript, Python, etc. DO NOT cause ReDoS in Rust's regex crate.*

These patterns will:

  • Compile successfully ✓
  • Run in guaranteed O(m n) time ✓
  • NOT hang or take exponential time ✓

✅ VALID: Size Limit Concerns

The RegexBuilder::size_limit and custom pattern validation in validate_custom_pattern() serve a DIFFERENT purpose:

  1. Prevent exponential memory usage during compilation

    • Pattern a{5}{5}{5}{5}{5}{5} expands to a{15625} which is huge
  2. Keep m reasonable in O(m n)

    • A very large m (even with guaranteed O(mn)) can still be slow
    • But it's NOT exponential/catastrophic
  3. Control compile times

    • Regex compilation is O(m) but large m means longer compile time

@lucarlig lucarlig added wxo wxo integration release-fix Critical bugfix required for the release labels Mar 26, 2026
dima-zakharov
dima-zakharov previously approved these changes Mar 26, 2026
Copy link
Copy Markdown
Collaborator

@dima-zakharov dima-zakharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes Applied
- Documented custom pattern trust boundary in plugins_rust/pii_filter/src/patterns.rs and READMEs
- Improved UTF-8 boundary error message in plugins_rust/pii_filter/src/masking.rs
- Documented hash mask output length change in READMEs
- Fixed SSN generation in performance tests to skip invalid SSA area codes
- Correction: Rust regex engine does not suffer from traditional ReDoS attacks

Recommendation
Ready to merge

@lucarlig lucarlig added release-fix Critical bugfix required for the release and removed release-fix Critical bugfix required for the release labels Mar 26, 2026
@lucarlig lucarlig force-pushed the fix/pii-filter-regression-tests branch from 3ced268 to 6905c12 Compare March 26, 2026 17:44
@lucarlig lucarlig force-pushed the fix/pii-filter-regression-tests branch from 6905c12 to a7b9c7e Compare March 27, 2026 09:13
Copy link
Copy Markdown
Collaborator

@dawid-nowak dawid-nowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As as suggestion... From looking at the code, it seems that all pattern matching is executed one-by-one. It might be good to try to parallelize the execution with Rayon/parallel streams.

Assuming that Rayon is going to work with Python that is..

@lucarlig
Copy link
Copy Markdown
Collaborator Author

As as suggestion... From looking at the code, it seems that all pattern matching is executed one-by-one. It might be good to try to parallelize the execution with Rayon/parallel streams.

Assuming that Rayon is going to work with Python that is..

Given the plugin is currently in-process (and that’s not changing for now), adding parallelism would directly compete with mcp-gateway for resources and will impact its performance. I think we should consider both internal plugin parallelism and plugin-level parallelism (running multiple plugins in parallel), but I’d consider that out of scope for this PR

@lucarlig lucarlig force-pushed the fix/pii-filter-regression-tests branch from a7b9c7e to 46786b2 Compare March 27, 2026 10:57
lucarlig added 16 commits March 27, 2026 11:27
…t in Rust implementation

Signed-off-by: lucarlig <luca.carlig@ibm.com>
…ction to Rust implementation

Signed-off-by: lucarlig <luca.carlig@ibm.com>
…ing logic

Signed-off-by: lucarlig <luca.carlig@ibm.com>
…tector

Signed-off-by: lucarlig <luca.carlig@ibm.com>
…dge cases

Signed-off-by: lucarlig <luca.carlig@ibm.com>
…imit upper bounds

Signed-off-by: lucarlig <luca.carlig@ibm.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>
@ja8zyjits
Copy link
Copy Markdown
Member

The Python Part looks good to me, need to make sure we have some E2E test so that other plugins wont fail.

@brian-hussey brian-hussey merged commit 8071c6d into main Mar 27, 2026
35 checks passed
@brian-hussey brian-hussey deleted the fix/pii-filter-regression-tests branch March 27, 2026 14:06
@brian-hussey
Copy link
Copy Markdown
Member

Using admin override to merge. PR approved by @dima-zakharov.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working plugins release-fix Critical bugfix required for the release rust Rust programming security Improves security wxo wxo integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants