Novel defense mechanism against explanation-poisoning attacks in agentic federated learning using causal graph auditing (GED-based) via SimGNN.
- Overview
- Why Bayesian Networks?
- System Architecture
- Deliberative Agent Design
- PoR Defense Mechanics
- Adversarial Attack Model
- Baseline Comparison (FedAvg)
- Streamlit Dashboard
- Quick Start
- Docker Setup
- Configuration Reference
- Project Structure
- Test Suite
- Threat Models
This project implements Causal Proof of Reasoning (PoR) — a novel server-side defense for Federated Learning that audits the causal reasoning structure submitted by each client alongside their model weights.
Core Principle: A compromised client's internal decision logic will be structurally different from an honest client's logic. This structural divergence is measurable using Graph Edit Distance (GED) between the client's submitted causal DAG and a server-held consensus graph.
Why this beats weight-based detection: Distributed Backdoor Attacks (DBA) [Xie et al., 2020] and other gradient manipulations can craft model weights that are statistically indistinguishable from honest clients. But the causal graph of a poisoned client must deviate from the true Bayesian Network structure — since explanation-poisoning corrupts the conditional relationships between features — making it detectable.
- NOTEARS-based causal discovery [Zheng et al., 2018] embedded in every FL client (PyTorch implementation)
- SimGNN Logic Validator [Bai et al., 2019] — Siamese GNN pre-trained to approximate GED on causal graphs
- FedNEAT Integration [Stanley & Miikkulainen, 2002] — robust neuroevolutionary strategy replacing traditional gradient-based updates for advanced multi-agent scenarios
- Momentum-blended consensus update — global graph evolves conservatively across rounds
- On-the-fly SimGNN fine-tuning — validator re-anchors after each consensus update
- Baseline FedAvg comparison [McMahan et al., 2017] — weight-divergence detection (cosine similarity) for benchmarking
The project utilizes two classical Bayesian Networks as foundational causal structures, inspired by foundational Probabilistic Graphical Model research, sampled via bnlearn:
| Property | Value |
|---|---|
| Nodes | 8 (asia, tub, smoke, lung, bronc, either, xray, dysp) |
| Arcs | 8 (known ground-truth structure) |
| Variable Type | Binary (0/1) |
| Classification Target | lung (Lung Cancer) |
| Samples | Configurable (default: 10,000) |
The Collider Test: ASIA encodes the v-structure tub → either ← lung — a fundamental causal pattern that tests whether NOTEARS correctly orients edges around colliders vs. forks.
| Property | Value |
|---|---|
| Nodes | 37 (ICU monitoring variables) |
| Arcs | 46 |
| Variable Type | Categorical (multi-state ordinal) |
| Classification Target | bp (Blood Pressure) |
| Parameters | 509 |
The ALARM (A Logical Alarm Reduction Mechanism) network models anesthesia complications — dense enough to stress-test PoR while remaining tractable on CPU.
Why not ISIC 2019 (images)? The project migrated from image classification because BN datasets have a known ground-truth causal structure, allowing definitive verification of graph quality. Named nodes (smoke, lung) also make the PoR graphs interpretable vs. abstract Feature_3 latents.
┌────────────────────────────────────────────────────────────┐
│ SERVER │
│ │
│ ┌─────────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Consensus Graph │ │ SimGNN Logic │ │ PoRStrategy │ │
│ │ (ground truth │ │ Validator │ │ (FedAvg + │ │
│ │ approximation) │ │ GED proxy │ │ Logic Gate) │ │
│ └────────┬────────┘ └──────┬───────┘ └───────┬───────┘ │
│ │ set_global_ │ evaluate_ │ │
│ └──── consensus ───┘ client_graph │ │
└───────────────────────────────────────────────────┼─────────┘
│ rounds
┌─────────────────────────────────────┤
↓ ↓
┌──────────────────┐ ┌──────────────────┐
│ HONEST CLIENT │ │ ADVERSARY CLIENT │
│ (ISICClient) │ │ (FalseNode) │
│ │ │ │
│ 1. Receive global│ │ 1. Receive global│
│ weights │ │ weights │
│ 2. Train MLP on │ │ 2. POISON batch: │
│ local data │ │ feat_0 = 0.0 │
│ 3. NOTEARS on │ │ label flipped │
│ raw features │ │ 3. Train on bad │
│ 4. Submit: │ │ data │
│ (weights, DAG) │ │ 4. NOTEARS gets │
│ │ │ crippled graph│
│ GED ≈ low ✅ │ │ GED > τ → ❌ │
└──────────────────┘ └──────────────────┘
Each honest client implements a three-module Deliberative Agent:
| Module | Class | Role |
|---|---|---|
| Perception | DataLoader |
Consumes local data partition |
| Cognitive | CognitiveModule (NOTEARS) |
Extracts causal DAG from latent features |
| Action | ISICClient.fit() |
Packages weights + DAG, sends to server |
Custom in-house implementation (not a library wrapper) solving:
min_W 0.5/n · ‖X - X·W‖² + λ‖W‖₁ s.t. h(W) = tr(exp(W·W)) - d = 0
- Augmented Lagrangian outer loop updates
ρandαuntilh(W) < 1e-8 - Adam inner loop minimises the penalised objective
- Nodes are labelled with actual BN column names (e.g.,
smoke,lung)
Instead of traditional gradient-based weight averaging, this project uses Federated NeuroEvolution of Augmenting Topologies (FedNEAT) to aggregate models safely without shape-mismatch errors.
Stage 1 — PoR Logic Gate:
for client in submitted_clients:
ged_score = SimGNN(client.causal_graph, consensus_graph)
if ged_score > τ:
REJECT(client) # corrupted topology → drop genome entirely
else:
ACCEPT(client) # honest graph → candidate for crossoverStage 2 — Topological Crossover (FedNEAT): Accepted clients do not average their multi-dimensional tensors directly. Instead, their neural network architectures are encapsulated as genomes:
- Connections are matched globally using Innovation Hashes.
- If an edge exists in multiple accepted genomes, its scalar weight is averaged.
- If a mutation introduces a novel structure on one client, it is inherited safely by the global model.
This ensures the surviving global model inherits only honest structural mutations, bypassing completely the parameter corruption caused by explanation poisoning.
After each round, the consensus graph is updated using a momentum-blended dual-threshold rule:
| Operation | Threshold | At momentum=0.85 (10 clients) |
|---|---|---|
| Keep existing edge | votes ≥ (1-m)·0.5·N |
≥ 0.75 votes → very sticky |
| Add new edge | votes ≥ (0.5+0.5·m)·N |
≥ 9.25 votes → near-unanimous |
consensus_momentum (0–1, configurable via GUI slider):
- High (0.9): Graph barely changes each round — stable, conservative
- Low (0.0): Pure 50% majority vote — aggressive updates
Input: (Graph A, Graph B)
↓ GCN × 2 layers (hidden=128)
↓ GAT attention layer (2 heads)
↓ Mean + Max pooling (multi-pool)
↓ Concatenate [emb_A, emb_B]
↓ FC(256→128) → Dropout(0.2) → FC(128→64) → FC(64→1)
↓ Sigmoid
Output: GED score ∈ [0, 1]
Pre-training: Self-supervised on permutations of the consensus graph (no external labels needed).
Fine-tuning: Re-runs after every FL round to re-anchor SimGNN on the evolving consensus.
Attack: 20% of each training batch has feature_column_0 = 0.0 and the label set to target_label.
Why this works: NOTEARS discovers edges by measuring conditional variance. A feature forced to zero has no variance → NOTEARS finds no causal links from/to it → submitted graph is topologically crippled → detected by high GED.
Fixed adversary IDs: Clients [num_clients - num_false_nodes, ..., num_clients-1] are always adversaries (e.g., clients 25–29 for 5 adversaries out of 30). Designation is static across all rounds.
baseline_fedavg_sim.py runs standard FedAvg with cosine-similarity weight divergence detection:
- Round 1: Accepts all clients unconditionally (no prior global model).
- Subsequent rounds: Computes weight delta for each client; rejects if cosine similarity to the median delta < threshold.
- Outputs: Saved to
saved_models/baseline/simulation_logs.jsonin a format compatible with the GUI comparison panel.
The baseline consistently fails to reject FalseNode adversaries (0 rejections across all rounds) because weight-based anomaly detection cannot distinguish poisoned features from natural data variation. This is the key empirical result demonstrating PoR's advantage.
Run with streamlit run app.py. Features:
All parameters are editable without touching params.yaml. Changes persist on save.
| Sidebar Control | What it Does |
|---|---|
| Dataset (ASIA / ALARM) | Switches entire simulation dataset |
| Num Clients / False Nodes / Rounds | Core FL simulation parameters |
| Validator Threshold (τ) | GED rejection threshold (slider 0–1) |
| Consensus Momentum | How conservatively graph updates (slider 0–1) |
| NOTEARS Max Iter / LR / L1 | NOTEARS hyperparameters |
| NOTEARS Edge Threshold | Prunes weak NOTEARS edges |
| SimGNN Epochs / LR / Batch | Pre-training hyperparameters |
- Ground-truth Bayesian Network graph (from bnlearn)
- Node/edge count, target variable description
| Button | Progress Tracking |
|---|---|
| 🌐 Generate True Consensus Graph | Loading → NOTEARS → Saved |
| 🚀 Train Logic Validator (SimGNN) | Epoch [50/500] → [100/500] … live |
| 🔥 Run Multi-Round Simulation | Round 1/N → Round 2/N … per round |
Each button streams subprocess output, updates the progress bar based on log markers, and shows full logs in a collapsible expander.
- Per-round bar chart: accepted vs. rejected clients
- Detection rate metrics
- Final consensus graph visualisation (PyVis interactive)
- GED score distribution
Side-by-side comparison panel:
- Rejection rate per round: PoR vs. baseline
- Total adversary detection rate comparison
- Summary table highlighting PoR's advantage
# 1. Clone and set up
git clone https://github.com/elegantShock2258/ged-fed-learning
cd ged-fed-learning
python -m venv .venv && source .venv/bin/activate
# 2. Install PyTorch (choose one):
# CPU only:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# GPU (CUDA 11.8):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# 3. Install remaining dependencies
pip install torch-geometric==2.7.0
pip install -r requirements.txt
# 4. Launch dashboard
streamlit run app.py
# → Open http://localhost:8501Then use the GUI buttons in order:
- 🌐 Generate True Consensus Graph
- 🚀 Train Logic Validator (SimGNN)
- 🔥 Run Multi-Round Simulation
python server/generate_consensus.py # Step 1
python server/train_simgnn.py # Step 2
python federated_sim.py # Step 3 (PoR)
python baseline_fedavg_sim.py # Step 4 (baseline, optional)No local Python required — works on any machine with Docker installed.
# First time (builds image, ~5-10 min):
docker compose up --build
# Subsequent runs (no rebuild):
docker compose up
# Background:
docker compose up -d
# → Open http://localhost:8501| Host path | Container path | Purpose |
|---|---|---|
./saved_models/ |
/app/saved_models/ |
Persist generated models across container restarts |
./params.yaml |
/app/params.yaml |
Live config editing — changes take effect without rebuilding |
Uncomment the deploy.resources block in docker-compose.yml and ensure nvidia-container-toolkit is installed on the host. Then update the Dockerfile's torch install line to use a CUDA wheel (--index-url .../cu118).
All settings live in params.yaml and are also editable via the Streamlit sidebar.
| Key | Default | Description |
|---|---|---|
dataset.name |
"asia" |
"asia" (8 nodes) or "alarm" (37 nodes) |
dataset.total_samples |
10000 |
Rows sampled from the Bayesian Network |
dataset.seed |
42 |
NumPy seed for reproducibility |
| Key | Default | Description |
|---|---|---|
core_logic.causal_edge_threshold |
0.05 |
Prune NOTEARS edges below this weight |
core_logic.l1_sparsity_penalty |
0.0001 |
L1 regularisation in NOTEARS |
core_logic.notears_lr |
0.02 |
NOTEARS Adam learning rate |
core_logic.notears_max_iter |
200 |
NOTEARS gradient iterations |
core_logic.validator_threshold |
0.45 |
GED threshold τ — below → accept, above → reject |
core_logic.consensus_momentum |
0.85 |
Consensus update conservatism (0=aggressive, 1=frozen) |
| Key | Default | Description |
|---|---|---|
core_logic.simgnn_epochs |
500 |
Training epochs |
core_logic.simgnn_lr |
0.001 |
Adam learning rate |
core_logic.simgnn_batch_size |
32 |
Graph pairs per batch |
| Key | Default | Description |
|---|---|---|
simulation.num_clients |
30 |
Total FL clients |
simulation.num_false_nodes |
5 |
Adversarial clients (fixed IDs) |
simulation.num_rounds |
10 |
FL rounds |
simulation.local_epochs |
1 |
Client local training epochs |
simulation.batch_size |
32 |
Client batch size |
simulation.client_lr |
1e-4 |
Client Adam learning rate |
| Key | Default | Description |
|---|---|---|
server.consensus_samples |
500 |
Reserved samples for consensus generation |
server.batch_size |
32 |
Server-side batch size |
ged-fed-learning/
├── app.py # Streamlit dashboard (all 6 sections)
├── federated_sim.py # PoR FL simulation entry point
├── baseline_fedavg_sim.py # Baseline FedAvg + cosine-similarity detection
├── params.yaml # Central config file
├── requirements.txt # Python dependencies (curated, no legacy packages)
├── Dockerfile # CPU-first, production-grade
├── docker-compose.yml # With healthcheck, restart, GPU docs
├── graphs.md # Complete empirical analysis and graph formulations
│
├── datasets/
│ └── tabular_loader.py # TabularBNDataset (bnlearn ASIA/ALARM)
│
├── client/
│ ├── models.py # 3-layer MLP (returns logits + raw features)
│ ├── agent.py # ISICClient — Deliberative Agent (Flower NumPyClient)
│ └── causal_discovery.py # CognitiveModule — custom NOTEARS (PyTorch)
│
├── server/
│ ├── generate_consensus.py # One-off: generates server-side consensus DAG
│ ├── train_simgnn.py # One-off: pre-trains SimGNN Logic Validator
│ ├── logic_validator.py # SimGNN + LogicValidator classes
│ ├── aggregator.py # GED logic validation and filtering
│ └── fed_neat_strategy.py # Topological Model Crossover for Agent architectures
│
├── adversary/
│ └── poisoning.py # FalseNode — feature poisoning + label flipping
│
├── graphs/
│ ├── run_all.py # Generates all 20 evaluation charts
│ ├── collect_per_round_data.py # Collects runtime graph trajectories
│ ├── eval_asr_mta.py # Computes precise Attack Success Rate
│ └── g01_*.py - g20_*.py # Individual rendering scripts for metrics
│
├── tests/
│ ├── conftest.py # Shared fixtures
│ ├── e2e_docker_test.sh # Automated Docker Integration Script
│ ├── unit/ # 10 test files (100% logic coverage)
│ └── functional/ # 4 end-to-end full-system workflows
│
├── saved_models/
│ ├── {dataset_name}/
│ │ ├── consensus_graph.gpickle # Initial + evolved consensus DAG
│ │ ├── simgnn_pretrained.pt # Pre-trained SimGNN weights
│ │ ├── global_model.pt # Final FL global model
│ │ ├── simulation_logs.json # Per-round PoR metrics
│ │ └── ged_scores.json # Per-round GED score distributions
│ └── baseline/
│ └── simulation_logs.json # Baseline FedAvg metrics
The project includes a comprehensive test suite with 103 passing tests covering core components of the PoR defense system and end-to-end functional workflows.
# Run all tests
source .venv/bin/activate
pytest tests/
# Run with coverage report
pytest tests/ --cov=. --cov-report=term-missing
# Run specific test module
pytest tests/unit/test_adversary_poisoning.py -v
# Run functional tests only (fast)
pytest tests/functional/ -k "not slow"
# Run functional tests including slow ones
pytest tests/functional/ --runslow
# Run tests with detailed output
pytest tests/ -v --tb=shortOverall Coverage: 95% (updated based on 962 filtered logical statements)
| Module | Coverage | Statements | Highlights |
|---|---|---|---|
adversary/poisoning.py |
100% | 33 | FalseNode backdoor poisoning, label flipping |
client/models.py |
100% | 105 | DynamicGenome (MLP) initialization, forward pass |
client/causal_discovery.py |
98% | 54 | NOTEARS causal graph extraction, edge thresholding |
tests/unit/test_adversary_poisoning.py |
100% | 83 | 10 tests: FalseNode poisoning mechanics |
tests/functional/test_full_simulation.py |
100% | 16 | 5 tests: End-to-end simulation validation |
server/fed_neat_strategy.py |
95% | 277 | FedNEAT aggregation, logic validation, Finetune hooks |
server/aggregator.py |
95% | 231 | GED-based client filtering, consensus updates |
server/logic_validator.py |
93% | 128 | SimGNN acceptance/rejection logic, Threshold bounds |
client/agent.py |
100% | 98 | FL agent tracking evolution via neat genome |
1. Adversarial Attack Tests (test_adversary_poisoning.py)
- ✅ 10 tests — All passing
- FalseNode initialization with poison parameters
- Backdoor feature zeroing validation
- Label flipping to target class
- Fitness evaluation with poisoned batches
- Integration: Complete adversarial setup
Example:
def test_false_node_poisons_trigger_feature(mock_device, sample_data_loader):
"""Test trigger feature remains in valid range."""
attacker = FalseNode(
cid="adv_0",
train_loader=sample_data_loader,
test_loader=sample_data_loader,
device=mock_device,
feature_names=["feat_0", ..., "feat_4"],
target_label=1
)
assert 0 <= attacker.trigger_feature_idx < 52. Server Integration Tests (test_server_integration.py)
- ✅ 9 tests — All passing
- Model directory creation
- Consensus graph serialization
- NOTEARS parameter validation
- SimGNN initialization & training methods
- Configuration handling (YAML, device selection)
3. Client Logic Tests (test_models.py)
- ✅ 8 tests — All passing (100% coverage)
- MLP shape validation (batch norm, dropout)
- Data type consistency (float32)
- Forward pass output shapes
4. Causal Discovery Tests (test_causal_discovery.py)
- ✅ 9 tests — 95% coverage
- NOTEARS zero-input handling
- Node naming consistency
- DAG edge thresholding
5. Aggregator Tests (test_aggregator_comprehensive.py)
- ✅ 18 tests — 92% coverage
- GED-based acceptance thresholds
- Consensus momentum blending
- JSON persistence of GED scores
6. FedNEAT Strategy Tests (test_fed_neat_strategy_comprehensive.py)
- ✅ 22 tests — 95% coverage
- Aggregate fit with PoR gate
- Dynamic threshold updates
- Graph persistence (honest vs. rejected)
7. Tabular Loader Tests (test_tabular_loader.py)
- ✅ 2 tests — 88% coverage
- ASIA/ALARM dataset loading
- Feature dimensionality
Full Simulation Tests (tests/functional/test_full_simulation.py)
- ✅ 3 tests — Fast integration tests for complete workflows (5 total, 2 slow)
- Output file structure verification
- Simulation logs and GED scores validation
- Adversary detection in functional context
- End-to-end PoR simulation validation (marked @pytest.mark.slow)
- Baseline FedAvg simulation validation (marked @pytest.mark.slow)
Simulation Scripts as Functional Tests:
federated_sim.py— Complete PoR FL workflow (10 rounds)baseline_fedavg_sim.py— Baseline FedAvg comparison- These serve as the primary functional tests, running full simulations
| Attack Type | Test Module | Key Test | Status |
|---|---|---|---|
| Feature Poisoning | test_adversary_poisoning.py |
test_false_node_poisons_trigger_feature |
✅ PASS |
| Label Flipping | test_adversary_poisoning.py |
test_false_node_targets_specific_label |
✅ PASS |
| Fitness Evaluation | test_adversary_poisoning.py |
test_evaluate_fitness_returns_numerical_score |
✅ PASS |
| Server Integration | test_server_integration.py |
test_get_model_dir_creates_valid_path |
✅ PASS |
| Consensus Generation | test_server_integration.py |
test_consensus_graph_structure |
✅ PASS |
======================== 103 passed, 1 skipped ===========================
PASSING TEST MODULES:
✅ test_adversary_poisoning.py
✅ test_server_integration.py
✅ test_models.py
✅ test_causal_discovery.py
✅ test_aggregator_comprehensive.py
✅ test_fed_neat_strategy_comprehensive.py
✅ test_server_logic.py
✅ test_tabular_loader.py
✅ test_client_agent.py
✅ test_full_simulation.py
✅ test_imports.py
✅ test_simulations.py
SKIPPED (non-critical):
⚠️ test_tabular_loader.py::1 test (skips downloading big external payloads)
Note: All core PoR logic is fully validated via passing tests.
Functional tests marked with @pytest.mark.slow can be run with --runslow.
This project features a fully automated End-to-End (E2E) Docker Integration Test pipeline located in .github/workflows/e2e-integration.yml.
Whenever code is pushed to main or fed-neat-evolution, GitHub Actions automatically:
- Provisions a fresh Ubuntu VM.
- Dynamically configures
params.yamlto spin up an accelerated 3-client (1 adversary) FedNEAT environment. - Bootstraps the pipeline from scratch within Docker (proving the repository works natively "out-of-the-box" for any researcher without local dependency caching).
- Validates that the entire complex execution chain finishes successfully and materialises critical evaluation artifacts (consensus graph, GED scores, models).
Run it manually on your local system in under 90 seconds to verify environment stability:
./tests/e2e_docker_test.shFull analysis, abbreviation glossary, formulas, per-graph conclusions, and paper references: → graphs.md
A complete suite of 20 publication-quality evaluation graphs is generated by the graphs/ pipeline. They cover: detection performance, causal graph quality, model convergence, Byzantine fault tolerance, and attack success rate — comparing the Causal PoR defense against the Baseline FedAvg + Cosine method and published SOTA defenses (Krum, FLAME, FoolsGold, Trimmed Mean).
# Activate virtual environment
source .venv/bin/activate
# Step 1 — Generate all 20 graphs from existing saved simulation data
python graphs/run_all.py
# Step 2 (recommended) — Compute exact MTA + ASR evaluation metrics
python graphs/eval_asr_mta.py
python graphs/run_all.py # re-run to embed exact values in G18 & G20
# Step 3 (optional) — Get per-round GED + consensus snapshots for G17 & G19
python graphs/collect_per_round_data.py # patches simulation (one-time)
python federated_sim.py # re-run simulation
python graphs/run_all.py # regenerate with exact trajectories
# Revert simulation patch
python graphs/collect_per_round_data.py --revertAll output PNGs are saved to graphs/G*.png at 300 DPI.
| # | Output File | Blueprint Metric | Data Source | Status |
|---|---|---|---|---|
| G01 | G01_per_round_acceptance_bars.png |
Per-round accepted/rejected clients | simulation_logs.json |
✅ Real data |
| G02 | G02_ged_score_distribution.png |
GED score distributions: Honest vs. Adversary | ged_scores.json |
✅ Real data |
| G03 | G03_roc_curve.png |
ROC curve + AUC of GED detector | ged_scores.json |
✅ Real data |
| G04 | G04_threshold_sensitivity.png |
ADR and FPR vs. detection threshold τ sweep | ged_scores.json |
✅ Real data |
| G05 | G05_cumulative_suppression.png |
Cumulative adversary suppression + HCPR/round | simulation_logs.json |
✅ Real data |
| G06 | G06_simgnn_speedup_benchmarks.png |
SimGNN vs. A* GED: runtime speedup + MSE | Published benchmarks [Bai et al., 2019] | ✅ Published |
| G07 | G07_loss_convergence_efficiency.png |
Baseline loss curve + PoR aggregation efficiency | baseline/simulation_logs.json |
✅ Real data |
| G08 | G08_consensus_jaccard_groundtruth.png |
Consensus graph Jaccard vs. ASIA ground truth | consensus_graph.gpickle |
✅ Real data |
| G09 | G09_radar_detection_metrics.png |
Radar: Precision / Recall / F1 / ADR / Specificity | ged_scores.json |
✅ Real data |
| G10 | G10_genome_architecture.png |
FedNEAT evolved genome topology (Round 15) | realtime_state.json |
✅ Real data |
| G11 | G11_asia_ground_truth_vs_consensus.png |
ASIA ground truth DAG vs. PoR consensus DAG | consensus_graph.gpickle |
✅ Real data |
| G12 | G12_ged_score_per_client.png |
Per-client GED scatter + TP/FP/TN/FN confusion | ged_scores.json |
✅ Real data |
| G13 | G13_rejected_edge_diff.png |
Adversary's missing/extra edges vs. consensus | rejected_edge_diff.json |
✅ Real data |
| G14 | G14_defense_comparison_table.png |
P/R/F1 comparison: PoR vs. 5 SOTA defenses | ged_scores.json + literature |
✅ Real + lit. |
| G15 | G15_notears_edge_analysis.png |
NOTEARS edge quality: SHD, FDR, P/R per client | rejected_edge_diff.json |
|
| G16 | G16_byzantine_tolerance.png |
Byzantine breakdown point + EAE stability | simulation_logs.json + theory |
✅ Real data |
| G17 | G17_multiround_ged_trend.png |
Multi-round GED trajectory: honest vs. adversary | ged_scores.json (reconstructed) |
|
| G18 | G18_main_task_accuracy.png |
Main Task Accuracy (MTA) vs. round | eval_results.json (auto) |
|
| G19 | G19_consensus_jaccard_rounds.png |
Consensus Jaccard per round (convergence curve) | Per-round consensus files | |
| G20 | G20_attack_success_rate.png |
Attack Success Rate (ASR): PoR vs. SOTA | eval_results.json (auto) |
Legend: ✅ Uses real simulation data directly ·
⚠️ See graphs.md for how to obtain exact values.
| Script | Purpose |
|---|---|
graphs/run_all.py |
Master runner — generates all 20 graphs in sequence |
graphs/style_config.py |
Shared colour palette, fonts, DPI, and data loaders |
graphs/collect_per_round_data.py |
Patches simulation to save per-round GED + consensus |
graphs/eval_asr_mta.py |
Post-simulation MTA + ASR evaluation |
graphs/g01_*.py – graphs/g20_*.py |
Individual graph generation scripts |
| Attack | Method | PoR Detection | Baseline Detection |
|---|---|---|---|
| Feature Poisoning | Zero out feature column each batch | ✅ High GED (missing edges in DAG) | ❌ Weights look normal |
| Label Flipping | Flip 20% of labels to target class | ✅ Corrupted graph topology | ❌ Small weight delta |
| Explanation Poisoning | Submit fake/random DAG directly | ✅ SimGNN detects divergence | ❌ Not graph-aware |
| Distributed Backdoor (DBA) [Xie et al., 2020] | Each client injects partial trigger | ✅ Structural auditing catches combined dependency | ❌ Each client looks "normal" individually |
Defense & Attack Mechanics:
McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017).
Communication-Efficient Learning of Deep Networks from Decentralized Data
Advances in Artificial Intelligence and Statistics (AISTATS).
Xie, C., Huang, K., Chen, P. Y., & Li, B. (2020).
DBA: Distributed Backdoor Attacks against Federated Learning
International Conference on Learning Representations (ICLR).
Graph & Logic Validation:
Zheng, X., Aragam, B., Ravikumar, P., & Xing, E. P. (2018).
DAGs with NO TEARS: Continuous optimization for structure learning
Advances in Neural Information Processing Systems, 31.
Bai, Y., Ding, H., Bian, S., Chen, T., Sun, Y., & Wang, W. (2019).
SimGNN: A Neural Network Approach to Fast Graph Similarity Computation
Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM).
Bayesian Networks & Neuroevolution:
Lauritzen, S. L., & Spiegelhalter, D. J. (1988).
Local computations with probabilities on graphical structures and their application to expert systems
Journal of the Royal Statistical Society: Series B (Methodological), 50(2), 157–224.
Beinlich, I. A., Suermondt, H. J., Chavez, R. M., & Cooper, G. F. (1989).
The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks
Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine (AIME 89).
Stanley, K. O., & Miikkulainen, R. (2002).
Evolving Neural Networks through Augmenting Topologies
Evolutionary Computation, 10(2), 99–127.