Skip to content

Commit 3ced268

Browse files
committed
fix: address pii filter review follow-ups
Signed-off-by: lucarlig <luca.carlig@ibm.com>
1 parent f7855a3 commit 3ced268

5 files changed

Lines changed: 52 additions & 12 deletions

File tree

plugins/pii_filter/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ A plugin for detecting and masking Personally Identifiable Information (PII) in
2525
### Masking Strategies
2626
- **REDACT** - Complete replacement with `[REDACTED]` or custom text
2727
- **PARTIAL** - Show partial info (e.g., `***-**-1234` for SSN, `j***e@example.com` for email)
28-
- **HASH** - Replace with hash value for consistency
28+
- **HASH** - Replace with a deterministic SHA-256-derived placeholder such as `[HASH:8f434346648f6b96]`
2929
- **TOKENIZE** - Replace with unique token for reversibility
3030
- **REMOVE** - Complete removal of PII
3131

@@ -212,7 +212,7 @@ To prevent Regular Expression Denial of Service (ReDoS) attacks, custom patterns
212212
- **Maximum alternations (`|`):** 16
213213
- **Maximum quantifiers (`*`, `+`, `?`, `{}`):** 24
214214

215-
These limits ensure that custom patterns cannot create pathological regex behavior that could consume excessive CPU resources. Patterns exceeding these limits will be rejected during configuration validation.
215+
Custom patterns are expected to be written by trusted operators in plugin configuration, not supplied by end users at request time. The Rust detector uses the `regex` crate's linear-time matching engine, and these limits add extra guardrails for maintainability and compilation cost. Patterns exceeding these limits will be rejected during configuration validation.
216216

217217
Test the custom pattern:
218218
```python

plugins_rust/pii_filter/README.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,23 +152,30 @@ This section describes the current Rust detector behavior so users know what is
152152
**Covers**
153153
- User-defined regex patterns for organization-specific identifiers
154154
- Explicit per-pattern masking strategies
155-
- Guardrails that reject patterns that are too long or too complex
155+
- Guardrails that reject patterns that are too long or too complex for maintainable admin-authored configuration
156156

157157
**Does not cover**
158158
- Unlimited regex expressiveness
159159
- Automatic tuning of custom patterns for precision or recall
160160
- Protection against poor pattern choices that are syntactically valid but semantically too broad
161161

162+
Custom patterns are intended for trusted operators editing plugin configuration, not untrusted end-user input. The Rust implementation relies on the [`regex`](https://docs.rs/regex/latest/regex/) crate, which avoids catastrophic backtracking during matching, and then applies additional length and complexity limits to keep custom expressions readable and cheap to compile.
163+
162164
## Secret Detection
163165

164166
The Rust plugin also detects AWS keys and generic API-key style assignments, but secret formats tend to be environment-specific and evolve quickly. Treat those detectors as best-effort safeguards rather than exhaustive secret scanning, and use dedicated secret-scanning tooling if you need stronger guarantees.
165167

166168
## Security Notes
167169

168170
- Whitelist patterns are compiled case-insensitively.
169-
- Custom patterns must stay within basic length and complexity limits.
171+
- Custom patterns must stay within basic length and complexity limits and are meant for trusted admin-authored configuration.
170172
- Very large strings and oversized nested collections are rejected instead of being scanned indefinitely.
171173

174+
## Masking Notes
175+
176+
- `HASH` masking emits the first 16 hexadecimal characters of the SHA-256 digest, for example `[HASH:8f434346648f6b96]`.
177+
- Earlier releases emitted 8 hexadecimal characters. Update downstream parsers if they assumed the shorter fixed-width placeholder.
178+
172179
## Testing
173180

174181
```bash

plugins_rust/pii_filter/src/masking.rs

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -79,9 +79,12 @@ fn validate_detection_ranges(
7979
}
8080

8181
if !text.is_char_boundary(detection.start) || !text.is_char_boundary(detection.end) {
82-
return Err(
83-
"Invalid detection range: offsets must align to UTF-8 boundaries".to_string(),
84-
);
82+
return Err(format!(
83+
"Invalid detection range: offsets {}..{} must align to UTF-8 boundaries (text len: {})",
84+
detection.start,
85+
detection.end,
86+
text.len()
87+
));
8588
}
8689

8790
ranges.push((detection.start, detection.end));
@@ -327,4 +330,24 @@ mod tests {
327330
let err = mask_pii(text, &detections, &config).unwrap_err();
328331
assert!(err.contains("Overlapping detection ranges"));
329332
}
333+
334+
#[test]
335+
fn test_mask_pii_reports_utf8_boundary_offsets() {
336+
let config = PIIConfig::default();
337+
let text = "Joé";
338+
let mut detections = HashMap::new();
339+
detections.insert(
340+
PIIType::Custom,
341+
vec![Detection {
342+
value: "o".to_string(),
343+
start: 3,
344+
end: 3,
345+
mask_strategy: MaskingStrategy::Redact,
346+
}],
347+
);
348+
349+
let err = mask_pii(text, &detections, &config).unwrap_err();
350+
assert!(err.contains("offsets 3..3"));
351+
assert!(err.contains(&format!("text len: {}", text.len())));
352+
}
330353
}

plugins_rust/pii_filter/src/patterns.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,12 @@ pub fn compile_patterns(config: &PIIConfig) -> Result<CompiledPatterns, String>
321321
})
322322
}
323323

324+
/// Validate admin-authored custom patterns before compilation.
325+
///
326+
/// These patterns come from trusted plugin configuration rather than end-user input.
327+
/// The Rust `regex` crate uses a linear-time engine without catastrophic backtracking,
328+
/// so these limits are lightweight guardrails for readability, compile cost, and
329+
/// obvious mistakes instead of a full regex sandbox.
324330
fn validate_custom_pattern(pattern: &str) -> Result<(), String> {
325331
const MAX_CUSTOM_PATTERN_LEN: usize = 256;
326332
const MAX_ALTERNATIONS: usize = 16;

tests/unit/mcpgateway/plugins/plugins/pii_filter/test_pii_filter.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -913,11 +913,15 @@ def test_large_batch_detection(self):
913913

914914
# Generate 10,000 lines of text with PII
915915
lines = []
916-
for i in range(10000):
917-
area = (i % 799) + 100
918-
if area == 666:
919-
area = 667
920-
lines.append(f"User {i}: SSN {area:03d}-45-6789, Email user{i}@example.com")
916+
area = 100
917+
while len(lines) < 10000:
918+
if area != 666:
919+
i = len(lines)
920+
lines.append(f"User {i}: SSN {area:03d}-45-6789, Email user{i}@example.com")
921+
922+
area += 1
923+
if area >= 900:
924+
area = 100
921925
text = "\n".join(lines)
922926

923927
start = time.time()

0 commit comments

Comments
 (0)