Skip to content

elegantShock2258/ged-fed-learning

Repository files navigation

🛡️ Causal Proof of Reasoning — Federated Learning Defense

Novel defense mechanism against explanation-poisoning attacks in agentic federated learning using causal graph auditing (GED-based) via SimGNN.


Table of Contents

  1. Overview
  2. Why Bayesian Networks?
  3. System Architecture
  4. Deliberative Agent Design
  5. PoR Defense Mechanics
  6. Adversarial Attack Model
  7. Baseline Comparison (FedAvg)
  8. Streamlit Dashboard
  9. Quick Start
  10. Docker Setup
  11. Configuration Reference
  12. Project Structure
  13. Test Suite
  14. Threat Models

Overview

This project implements Causal Proof of Reasoning (PoR) — a novel server-side defense for Federated Learning that audits the causal reasoning structure submitted by each client alongside their model weights.

Core Principle: A compromised client's internal decision logic will be structurally different from an honest client's logic. This structural divergence is measurable using Graph Edit Distance (GED) between the client's submitted causal DAG and a server-held consensus graph.

Why this beats weight-based detection: Distributed Backdoor Attacks (DBA) [Xie et al., 2020] and other gradient manipulations can craft model weights that are statistically indistinguishable from honest clients. But the causal graph of a poisoned client must deviate from the true Bayesian Network structure — since explanation-poisoning corrupts the conditional relationships between features — making it detectable.

Key Contributions

  • NOTEARS-based causal discovery [Zheng et al., 2018] embedded in every FL client (PyTorch implementation)
  • SimGNN Logic Validator [Bai et al., 2019] — Siamese GNN pre-trained to approximate GED on causal graphs
  • FedNEAT Integration [Stanley & Miikkulainen, 2002] — robust neuroevolutionary strategy replacing traditional gradient-based updates for advanced multi-agent scenarios
  • Momentum-blended consensus update — global graph evolves conservatively across rounds
  • On-the-fly SimGNN fine-tuning — validator re-anchors after each consensus update
  • Baseline FedAvg comparison [McMahan et al., 2017] — weight-divergence detection (cosine similarity) for benchmarking

Why Bayesian Networks?

The project utilizes two classical Bayesian Networks as foundational causal structures, inspired by foundational Probabilistic Graphical Model research, sampled via bnlearn:

ASIA Network — Unit Testing Dataset (Lauritzen & Spiegelhalter, 1988)

Property Value
Nodes 8 (asia, tub, smoke, lung, bronc, either, xray, dysp)
Arcs 8 (known ground-truth structure)
Variable Type Binary (0/1)
Classification Target lung (Lung Cancer)
Samples Configurable (default: 10,000)

The Collider Test: ASIA encodes the v-structure tub → either ← lung — a fundamental causal pattern that tests whether NOTEARS correctly orients edges around colliders vs. forks.

ALARM Network — Full Simulation Dataset (Beinlich et al., 1989)

Property Value
Nodes 37 (ICU monitoring variables)
Arcs 46
Variable Type Categorical (multi-state ordinal)
Classification Target bp (Blood Pressure)
Parameters 509

The ALARM (A Logical Alarm Reduction Mechanism) network models anesthesia complications — dense enough to stress-test PoR while remaining tractable on CPU.

Why not ISIC 2019 (images)? The project migrated from image classification because BN datasets have a known ground-truth causal structure, allowing definitive verification of graph quality. Named nodes (smoke, lung) also make the PoR graphs interpretable vs. abstract Feature_3 latents.


System Architecture

┌────────────────────────────────────────────────────────────┐
│                         SERVER                              │
│                                                             │
│  ┌─────────────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │ Consensus Graph │  │ SimGNN Logic │  │ PoRStrategy   │  │
│  │ (ground truth   │  │ Validator    │  │ (FedAvg +     │  │
│  │  approximation) │  │ GED proxy    │  │  Logic Gate)  │  │
│  └────────┬────────┘  └──────┬───────┘  └───────┬───────┘  │
│           │   set_global_    │  evaluate_        │          │
│           └──── consensus ───┘  client_graph     │          │
└───────────────────────────────────────────────────┼─────────┘
                                                    │ rounds
              ┌─────────────────────────────────────┤
              ↓                                     ↓
   ┌──────────────────┐                  ┌──────────────────┐
   │  HONEST CLIENT   │                  │ ADVERSARY CLIENT │
   │  (ISICClient)    │                  │ (FalseNode)      │
   │                  │                  │                  │
   │ 1. Receive global│                  │ 1. Receive global│
   │    weights       │                  │    weights       │
   │ 2. Train MLP on  │                  │ 2. POISON batch: │
   │    local data    │                  │    feat_0 = 0.0  │
   │ 3. NOTEARS on    │                  │    label flipped │
   │    raw features  │                  │ 3. Train on bad  │
   │ 4. Submit:       │                  │    data          │
   │   (weights, DAG) │                  │ 4. NOTEARS gets  │
   │                  │                  │    crippled graph│
   │ GED ≈ low ✅     │                  │ GED > τ → ❌     │
   └──────────────────┘                  └──────────────────┘

Deliberative Agent Design

Each honest client implements a three-module Deliberative Agent:

Module Class Role
Perception DataLoader Consumes local data partition
Cognitive CognitiveModule (NOTEARS) Extracts causal DAG from latent features
Action ISICClient.fit() Packages weights + DAG, sends to server

NOTEARS Implementation (Custom PyTorch)

Custom in-house implementation (not a library wrapper) solving:

min_W  0.5/n · ‖X - X·W‖² + λ‖W‖₁    s.t.   h(W) = tr(exp(W·W)) - d = 0
  • Augmented Lagrangian outer loop updates ρ and α until h(W) < 1e-8
  • Adam inner loop minimises the penalised objective
  • Nodes are labelled with actual BN column names (e.g., smoke, lung)

PoR Defense Mechanics

Two-Stage Aggregation (FedNEAT Strategy)

Instead of traditional gradient-based weight averaging, this project uses Federated NeuroEvolution of Augmenting Topologies (FedNEAT) to aggregate models safely without shape-mismatch errors.

Stage 1 — PoR Logic Gate:

for client in submitted_clients:
    ged_score = SimGNN(client.causal_graph, consensus_graph)
    if ged_score > τ:
        REJECT(client)    # corrupted topology → drop genome entirely
    else:
        ACCEPT(client)    # honest graph → candidate for crossover

Stage 2 — Topological Crossover (FedNEAT): Accepted clients do not average their multi-dimensional tensors directly. Instead, their neural network architectures are encapsulated as genomes:

  1. Connections are matched globally using Innovation Hashes.
  2. If an edge exists in multiple accepted genomes, its scalar weight is averaged.
  3. If a mutation introduces a novel structure on one client, it is inherited safely by the global model.

This ensures the surviving global model inherits only honest structural mutations, bypassing completely the parameter corruption caused by explanation poisoning.

Consensus Graph Evolution

After each round, the consensus graph is updated using a momentum-blended dual-threshold rule:

Operation Threshold At momentum=0.85 (10 clients)
Keep existing edge votes ≥ (1-m)·0.5·N ≥ 0.75 votes → very sticky
Add new edge votes ≥ (0.5+0.5·m)·N ≥ 9.25 votes → near-unanimous

consensus_momentum (0–1, configurable via GUI slider):

  • High (0.9): Graph barely changes each round — stable, conservative
  • Low (0.0): Pure 50% majority vote — aggressive updates

SimGNN Architecture

Input: (Graph A, Graph B)
   ↓ GCN × 2 layers (hidden=128)
   ↓ GAT attention layer (2 heads)
   ↓ Mean + Max pooling (multi-pool)
   ↓ Concatenate [emb_A, emb_B]
   ↓ FC(256→128) → Dropout(0.2) → FC(128→64) → FC(64→1)
   ↓ Sigmoid
Output: GED score ∈ [0, 1]

Pre-training: Self-supervised on permutations of the consensus graph (no external labels needed).
Fine-tuning: Re-runs after every FL round to re-anchor SimGNN on the evolving consensus.


Adversarial Attack Model

FalseNode — Targeted Feature Poisoning

Attack: 20% of each training batch has feature_column_0 = 0.0 and the label set to target_label.

Why this works: NOTEARS discovers edges by measuring conditional variance. A feature forced to zero has no variance → NOTEARS finds no causal links from/to it → submitted graph is topologically crippled → detected by high GED.

Fixed adversary IDs: Clients [num_clients - num_false_nodes, ..., num_clients-1] are always adversaries (e.g., clients 25–29 for 5 adversaries out of 30). Designation is static across all rounds.


Baseline Comparison (FedAvg)

baseline_fedavg_sim.py runs standard FedAvg with cosine-similarity weight divergence detection:

  • Round 1: Accepts all clients unconditionally (no prior global model).
  • Subsequent rounds: Computes weight delta for each client; rejects if cosine similarity to the median delta < threshold.
  • Outputs: Saved to saved_models/baseline/simulation_logs.json in a format compatible with the GUI comparison panel.

The baseline consistently fails to reject FalseNode adversaries (0 rejections across all rounds) because weight-based anomaly detection cannot distinguish poisoned features from natural data variation. This is the key empirical result demonstrating PoR's advantage.


Streamlit Dashboard

Run with streamlit run app.py. Features:

Section 1 — Configuration Sidebar

All parameters are editable without touching params.yaml. Changes persist on save.

Sidebar Control What it Does
Dataset (ASIA / ALARM) Switches entire simulation dataset
Num Clients / False Nodes / Rounds Core FL simulation parameters
Validator Threshold (τ) GED rejection threshold (slider 0–1)
Consensus Momentum How conservatively graph updates (slider 0–1)
NOTEARS Max Iter / LR / L1 NOTEARS hyperparameters
NOTEARS Edge Threshold Prunes weak NOTEARS edges
SimGNN Epochs / LR / Batch Pre-training hyperparameters

Section 2 — Dataset Overview

  • Ground-truth Bayesian Network graph (from bnlearn)
  • Node/edge count, target variable description

Section 3 — Action Buttons (with live progress bars)

Button Progress Tracking
🌐 Generate True Consensus Graph Loading → NOTEARS → Saved
🚀 Train Logic Validator (SimGNN) Epoch [50/500] → [100/500] … live
🔥 Run Multi-Round Simulation Round 1/N → Round 2/N … per round

Each button streams subprocess output, updates the progress bar based on log markers, and shows full logs in a collapsible expander.

Section 4 — Simulation Results (PoR)

  • Per-round bar chart: accepted vs. rejected clients
  • Detection rate metrics
  • Final consensus graph visualisation (PyVis interactive)
  • GED score distribution

Section 5 — PoR vs. Baseline FedAvg Comparison

Side-by-side comparison panel:

  • Rejection rate per round: PoR vs. baseline
  • Total adversary detection rate comparison
  • Summary table highlighting PoR's advantage

Quick Start

Option A — Local (with virtualenv)

# 1. Clone and set up
git clone https://github.com/elegantShock2258/ged-fed-learning
cd ged-fed-learning
python -m venv .venv && source .venv/bin/activate

# 2. Install PyTorch (choose one):
# CPU only:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# GPU (CUDA 11.8):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# 3. Install remaining dependencies
pip install torch-geometric==2.7.0
pip install -r requirements.txt

# 4. Launch dashboard
streamlit run app.py
# → Open http://localhost:8501

Then use the GUI buttons in order:

  1. 🌐 Generate True Consensus Graph
  2. 🚀 Train Logic Validator (SimGNN)
  3. 🔥 Run Multi-Round Simulation

Option B — Terminal (manual pipeline)

python server/generate_consensus.py   # Step 1
python server/train_simgnn.py         # Step 2
python federated_sim.py               # Step 3 (PoR)
python baseline_fedavg_sim.py         # Step 4 (baseline, optional)

Docker Setup

No local Python required — works on any machine with Docker installed.

# First time (builds image, ~5-10 min):
docker compose up --build

# Subsequent runs (no rebuild):
docker compose up

# Background:
docker compose up -d

# → Open http://localhost:8501

Volume mounts

Host path Container path Purpose
./saved_models/ /app/saved_models/ Persist generated models across container restarts
./params.yaml /app/params.yaml Live config editing — changes take effect without rebuilding

GPU Support (optional)

Uncomment the deploy.resources block in docker-compose.yml and ensure nvidia-container-toolkit is installed on the host. Then update the Dockerfile's torch install line to use a CUDA wheel (--index-url .../cu118).


Configuration Reference

All settings live in params.yaml and are also editable via the Streamlit sidebar.

Dataset

Key Default Description
dataset.name "asia" "asia" (8 nodes) or "alarm" (37 nodes)
dataset.total_samples 10000 Rows sampled from the Bayesian Network
dataset.seed 42 NumPy seed for reproducibility

Core Logic (PoR + NOTEARS)

Key Default Description
core_logic.causal_edge_threshold 0.05 Prune NOTEARS edges below this weight
core_logic.l1_sparsity_penalty 0.0001 L1 regularisation in NOTEARS
core_logic.notears_lr 0.02 NOTEARS Adam learning rate
core_logic.notears_max_iter 200 NOTEARS gradient iterations
core_logic.validator_threshold 0.45 GED threshold τ — below → accept, above → reject
core_logic.consensus_momentum 0.85 Consensus update conservatism (0=aggressive, 1=frozen)

SimGNN Pre-training

Key Default Description
core_logic.simgnn_epochs 500 Training epochs
core_logic.simgnn_lr 0.001 Adam learning rate
core_logic.simgnn_batch_size 32 Graph pairs per batch

Simulation

Key Default Description
simulation.num_clients 30 Total FL clients
simulation.num_false_nodes 5 Adversarial clients (fixed IDs)
simulation.num_rounds 10 FL rounds
simulation.local_epochs 1 Client local training epochs
simulation.batch_size 32 Client batch size
simulation.client_lr 1e-4 Client Adam learning rate

Server

Key Default Description
server.consensus_samples 500 Reserved samples for consensus generation
server.batch_size 32 Server-side batch size

Project Structure

ged-fed-learning/
├── app.py                      # Streamlit dashboard (all 6 sections)
├── federated_sim.py            # PoR FL simulation entry point
├── baseline_fedavg_sim.py      # Baseline FedAvg + cosine-similarity detection
├── params.yaml                 # Central config file
├── requirements.txt            # Python dependencies (curated, no legacy packages)
├── Dockerfile                  # CPU-first, production-grade
├── docker-compose.yml          # With healthcheck, restart, GPU docs
├── graphs.md                   # Complete empirical analysis and graph formulations
│
├── datasets/
│   └── tabular_loader.py       # TabularBNDataset (bnlearn ASIA/ALARM)
│
├── client/
│   ├── models.py               # 3-layer MLP (returns logits + raw features)
│   ├── agent.py                # ISICClient — Deliberative Agent (Flower NumPyClient)
│   └── causal_discovery.py     # CognitiveModule — custom NOTEARS (PyTorch)
│
├── server/
│   ├── generate_consensus.py   # One-off: generates server-side consensus DAG
│   ├── train_simgnn.py         # One-off: pre-trains SimGNN Logic Validator
│   ├── logic_validator.py      # SimGNN + LogicValidator classes
│   ├── aggregator.py           # GED logic validation and filtering
│   └── fed_neat_strategy.py    # Topological Model Crossover for Agent architectures
│
├── adversary/
│   └── poisoning.py            # FalseNode — feature poisoning + label flipping
│
├── graphs/
│   ├── run_all.py              # Generates all 20 evaluation charts
│   ├── collect_per_round_data.py # Collects runtime graph trajectories
│   ├── eval_asr_mta.py         # Computes precise Attack Success Rate
│   └── g01_*.py - g20_*.py     # Individual rendering scripts for metrics
│
├── tests/
│   ├── conftest.py             # Shared fixtures
│   ├── e2e_docker_test.sh      # Automated Docker Integration Script
│   ├── unit/                   # 10 test files (100% logic coverage)
│   └── functional/             # 4 end-to-end full-system workflows
│
├── saved_models/
│   ├── {dataset_name}/
│   │   ├── consensus_graph.gpickle       # Initial + evolved consensus DAG
│   │   ├── simgnn_pretrained.pt          # Pre-trained SimGNN weights
│   │   ├── global_model.pt               # Final FL global model
│   │   ├── simulation_logs.json          # Per-round PoR metrics
│   │   └── ged_scores.json               # Per-round GED score distributions
│   └── baseline/
│       └── simulation_logs.json          # Baseline FedAvg metrics

Test Suite

The project includes a comprehensive test suite with 103 passing tests covering core components of the PoR defense system and end-to-end functional workflows.

Running Tests

# Run all tests
source .venv/bin/activate
pytest tests/

# Run with coverage report
pytest tests/ --cov=. --cov-report=term-missing

# Run specific test module
pytest tests/unit/test_adversary_poisoning.py -v

# Run functional tests only (fast)
pytest tests/functional/ -k "not slow"

# Run functional tests including slow ones
pytest tests/functional/ --runslow

# Run tests with detailed output
pytest tests/ -v --tb=short

Test Coverage Summary

Overall Coverage: 95% (updated based on 962 filtered logical statements)

Module Coverage Statements Highlights
adversary/poisoning.py 100% 33 FalseNode backdoor poisoning, label flipping
client/models.py 100% 105 DynamicGenome (MLP) initialization, forward pass
client/causal_discovery.py 98% 54 NOTEARS causal graph extraction, edge thresholding
tests/unit/test_adversary_poisoning.py 100% 83 10 tests: FalseNode poisoning mechanics
tests/functional/test_full_simulation.py 100% 16 5 tests: End-to-end simulation validation
server/fed_neat_strategy.py 95% 277 FedNEAT aggregation, logic validation, Finetune hooks
server/aggregator.py 95% 231 GED-based client filtering, consensus updates
server/logic_validator.py 93% 128 SimGNN acceptance/rejection logic, Threshold bounds
client/agent.py 100% 98 FL agent tracking evolution via neat genome

Test Organization

Unit Tests by Component

1. Adversarial Attack Tests (test_adversary_poisoning.py)

  • 10 tests — All passing
  • FalseNode initialization with poison parameters
  • Backdoor feature zeroing validation
  • Label flipping to target class
  • Fitness evaluation with poisoned batches
  • Integration: Complete adversarial setup

Example:

def test_false_node_poisons_trigger_feature(mock_device, sample_data_loader):
    """Test trigger feature remains in valid range."""
    attacker = FalseNode(
        cid="adv_0",
        train_loader=sample_data_loader,
        test_loader=sample_data_loader,
        device=mock_device,
        feature_names=["feat_0", ..., "feat_4"],
        target_label=1
    )
    assert 0 <= attacker.trigger_feature_idx < 5

2. Server Integration Tests (test_server_integration.py)

  • 9 tests — All passing
  • Model directory creation
  • Consensus graph serialization
  • NOTEARS parameter validation
  • SimGNN initialization & training methods
  • Configuration handling (YAML, device selection)

3. Client Logic Tests (test_models.py)

  • 8 tests — All passing (100% coverage)
  • MLP shape validation (batch norm, dropout)
  • Data type consistency (float32)
  • Forward pass output shapes

4. Causal Discovery Tests (test_causal_discovery.py)

  • 9 tests — 95% coverage
  • NOTEARS zero-input handling
  • Node naming consistency
  • DAG edge thresholding

5. Aggregator Tests (test_aggregator_comprehensive.py)

  • 18 tests — 92% coverage
  • GED-based acceptance thresholds
  • Consensus momentum blending
  • JSON persistence of GED scores

6. FedNEAT Strategy Tests (test_fed_neat_strategy_comprehensive.py)

  • 22 tests — 95% coverage
  • Aggregate fit with PoR gate
  • Dynamic threshold updates
  • Graph persistence (honest vs. rejected)

7. Tabular Loader Tests (test_tabular_loader.py)

  • 2 tests — 88% coverage
  • ASIA/ALARM dataset loading
  • Feature dimensionality

Functional Tests (End-to-End)

Full Simulation Tests (tests/functional/test_full_simulation.py)

  • 3 tests — Fast integration tests for complete workflows (5 total, 2 slow)
  • Output file structure verification
  • Simulation logs and GED scores validation
  • Adversary detection in functional context
  • End-to-end PoR simulation validation (marked @pytest.mark.slow)
  • Baseline FedAvg simulation validation (marked @pytest.mark.slow)

Simulation Scripts as Functional Tests:

  • federated_sim.py — Complete PoR FL workflow (10 rounds)
  • baseline_fedavg_sim.py — Baseline FedAvg comparison
  • These serve as the primary functional tests, running full simulations

Coverage by Attack Vector

Attack Type Test Module Key Test Status
Feature Poisoning test_adversary_poisoning.py test_false_node_poisons_trigger_feature ✅ PASS
Label Flipping test_adversary_poisoning.py test_false_node_targets_specific_label ✅ PASS
Fitness Evaluation test_adversary_poisoning.py test_evaluate_fitness_returns_numerical_score ✅ PASS
Server Integration test_server_integration.py test_get_model_dir_creates_valid_path ✅ PASS
Consensus Generation test_server_integration.py test_consensus_graph_structure ✅ PASS

Test Results

======================== 103 passed, 1 skipped ===========================

PASSING TEST MODULES:
✅ test_adversary_poisoning.py
✅ test_server_integration.py
✅ test_models.py
✅ test_causal_discovery.py
✅ test_aggregator_comprehensive.py
✅ test_fed_neat_strategy_comprehensive.py
✅ test_server_logic.py
✅ test_tabular_loader.py
✅ test_client_agent.py
✅ test_full_simulation.py
✅ test_imports.py
✅ test_simulations.py

SKIPPED (non-critical):
⚠️ test_tabular_loader.py::1 test (skips downloading big external payloads)

Note: All core PoR logic is fully validated via passing tests.
Functional tests marked with @pytest.mark.slow can be run with --runslow.

🚀 Continuous Integration (CI/CD)

This project features a fully automated End-to-End (E2E) Docker Integration Test pipeline located in .github/workflows/e2e-integration.yml.

Whenever code is pushed to main or fed-neat-evolution, GitHub Actions automatically:

  1. Provisions a fresh Ubuntu VM.
  2. Dynamically configures params.yaml to spin up an accelerated 3-client (1 adversary) FedNEAT environment.
  3. Bootstraps the pipeline from scratch within Docker (proving the repository works natively "out-of-the-box" for any researcher without local dependency caching).
  4. Validates that the entire complex execution chain finishes successfully and materialises critical evaluation artifacts (consensus graph, GED scores, models).

Run it manually on your local system in under 90 seconds to verify environment stability:

./tests/e2e_docker_test.sh

📊 Evaluation & Visualisation Graphs

Full analysis, abbreviation glossary, formulas, per-graph conclusions, and paper references: → graphs.md

A complete suite of 20 publication-quality evaluation graphs is generated by the graphs/ pipeline. They cover: detection performance, causal graph quality, model convergence, Byzantine fault tolerance, and attack success rate — comparing the Causal PoR defense against the Baseline FedAvg + Cosine method and published SOTA defenses (Krum, FLAME, FoolsGold, Trimmed Mean).

Quick Start — Generate All 20 Graphs

# Activate virtual environment
source .venv/bin/activate

# Step 1 — Generate all 20 graphs from existing saved simulation data
python graphs/run_all.py

# Step 2 (recommended) — Compute exact MTA + ASR evaluation metrics
python graphs/eval_asr_mta.py
python graphs/run_all.py   # re-run to embed exact values in G18 & G20

# Step 3 (optional) — Get per-round GED + consensus snapshots for G17 & G19
python graphs/collect_per_round_data.py   # patches simulation (one-time)
python federated_sim.py                   # re-run simulation
python graphs/run_all.py                  # regenerate with exact trajectories

# Revert simulation patch
python graphs/collect_per_round_data.py --revert

All output PNGs are saved to graphs/G*.png at 300 DPI.


Visualisation Blueprint — All 20 Graphs

# Output File Blueprint Metric Data Source Status
G01 G01_per_round_acceptance_bars.png Per-round accepted/rejected clients simulation_logs.json ✅ Real data
G02 G02_ged_score_distribution.png GED score distributions: Honest vs. Adversary ged_scores.json ✅ Real data
G03 G03_roc_curve.png ROC curve + AUC of GED detector ged_scores.json ✅ Real data
G04 G04_threshold_sensitivity.png ADR and FPR vs. detection threshold τ sweep ged_scores.json ✅ Real data
G05 G05_cumulative_suppression.png Cumulative adversary suppression + HCPR/round simulation_logs.json ✅ Real data
G06 G06_simgnn_speedup_benchmarks.png SimGNN vs. A* GED: runtime speedup + MSE Published benchmarks [Bai et al., 2019] ✅ Published
G07 G07_loss_convergence_efficiency.png Baseline loss curve + PoR aggregation efficiency baseline/simulation_logs.json ✅ Real data
G08 G08_consensus_jaccard_groundtruth.png Consensus graph Jaccard vs. ASIA ground truth consensus_graph.gpickle ✅ Real data
G09 G09_radar_detection_metrics.png Radar: Precision / Recall / F1 / ADR / Specificity ged_scores.json ✅ Real data
G10 G10_genome_architecture.png FedNEAT evolved genome topology (Round 15) realtime_state.json ✅ Real data
G11 G11_asia_ground_truth_vs_consensus.png ASIA ground truth DAG vs. PoR consensus DAG consensus_graph.gpickle ✅ Real data
G12 G12_ged_score_per_client.png Per-client GED scatter + TP/FP/TN/FN confusion ged_scores.json ✅ Real data
G13 G13_rejected_edge_diff.png Adversary's missing/extra edges vs. consensus rejected_edge_diff.json ✅ Real data
G14 G14_defense_comparison_table.png P/R/F1 comparison: PoR vs. 5 SOTA defenses ged_scores.json + literature ✅ Real + lit.
G15 G15_notears_edge_analysis.png NOTEARS edge quality: SHD, FDR, P/R per client rejected_edge_diff.json ⚠️ Estimated
G16 G16_byzantine_tolerance.png Byzantine breakdown point + EAE stability simulation_logs.json + theory ✅ Real data
G17 G17_multiround_ged_trend.png Multi-round GED trajectory: honest vs. adversary ged_scores.json (reconstructed) ⚠️ Reconstructed
G18 G18_main_task_accuracy.png Main Task Accuracy (MTA) vs. round eval_results.json (auto) ⚠️ Run eval script
G19 G19_consensus_jaccard_rounds.png Consensus Jaccard per round (convergence curve) Per-round consensus files ⚠️ Re-simulate
G20 G20_attack_success_rate.png Attack Success Rate (ASR): PoR vs. SOTA eval_results.json (auto) ⚠️ Run eval script

Legend: ✅ Uses real simulation data directly · ⚠️ See graphs.md for how to obtain exact values.

Graph Scripts Reference

Script Purpose
graphs/run_all.py Master runner — generates all 20 graphs in sequence
graphs/style_config.py Shared colour palette, fonts, DPI, and data loaders
graphs/collect_per_round_data.py Patches simulation to save per-round GED + consensus
graphs/eval_asr_mta.py Post-simulation MTA + ASR evaluation
graphs/g01_*.pygraphs/g20_*.py Individual graph generation scripts

Threat Models

Attack Method PoR Detection Baseline Detection
Feature Poisoning Zero out feature column each batch ✅ High GED (missing edges in DAG) ❌ Weights look normal
Label Flipping Flip 20% of labels to target class ✅ Corrupted graph topology ❌ Small weight delta
Explanation Poisoning Submit fake/random DAG directly ✅ SimGNN detects divergence ❌ Not graph-aware
Distributed Backdoor (DBA) [Xie et al., 2020] Each client injects partial trigger ✅ Structural auditing catches combined dependency ❌ Each client looks "normal" individually

Citation / Reference

Defense & Attack Mechanics:

McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017).
Communication-Efficient Learning of Deep Networks from Decentralized Data
Advances in Artificial Intelligence and Statistics (AISTATS).

Xie, C., Huang, K., Chen, P. Y., & Li, B. (2020).
DBA: Distributed Backdoor Attacks against Federated Learning
International Conference on Learning Representations (ICLR).

Graph & Logic Validation:

Zheng, X., Aragam, B., Ravikumar, P., & Xing, E. P. (2018).
DAGs with NO TEARS: Continuous optimization for structure learning
Advances in Neural Information Processing Systems, 31.

Bai, Y., Ding, H., Bian, S., Chen, T., Sun, Y., & Wang, W. (2019).
SimGNN: A Neural Network Approach to Fast Graph Similarity Computation
Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM).

Bayesian Networks & Neuroevolution:

Lauritzen, S. L., & Spiegelhalter, D. J. (1988).
Local computations with probabilities on graphical structures and their application to expert systems
Journal of the Royal Statistical Society: Series B (Methodological), 50(2), 157–224.

Beinlich, I. A., Suermondt, H. J., Chavez, R. M., & Cooper, G. F. (1989).
The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks
Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine (AIME 89).

Stanley, K. O., & Miikkulainen, R. (2002).
Evolving Neural Networks through Augmenting Topologies
Evolutionary Computation, 10(2), 99–127.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors