Skip to content

Latest commit

 

History

History
357 lines (274 loc) · 10 KB

File metadata and controls

357 lines (274 loc) · 10 KB

PhishGuard M4 - Storage & MongoDB Integration

Overview

Module 4 (M4) implements secure storage and retrieval of analyzed email data using MongoDB Atlas. This module seamlessly integrates with the signature verification system from M3 to ensure data integrity and authenticity.

Architecture

Components

  1. mongo_client.py - MongoDB connection management

    • Singleton client pattern with connection pooling
    • Secure connection handling with TLS/SSL
    • Automatic connection validation and retry logic
    • Production-grade timeout and error handling
  2. storage.py - Database operations layer

    • Insert verified analyses with signature checking
    • Load and verify stored analyses
    • Automatic index management (gmail_id, processed_at, risk_label)
    • Optional TTL (Time-To-Live) index for automatic cleanup
  3. app.py (History Tab) - User interface

    • View all stored analyses with signature verification status
    • Filter by risk level (SAFE, REVIEW, HIGH_RISK)
    • Display analysis details with visual indicators
    • Real-time signature verification

Security Features

1. Encrypted Credentials

  • MongoDB URI stored in encrypted secrets.enc file
  • Loaded at runtime via M3 decryption
  • Never hardcoded in source code

2. Digital Signatures

  • Every analysis signed with HMAC-SHA256 before storage
  • Signatures verified on retrieval
  • Invalid signatures clearly marked in UI
  • Tamper detection and alerting

3. Session-Based Insertion

  • Only analyses from current app session are stored
  • Prevents backdating or stale data insertion
  • Uses app_start_time as session marker

4. Secure Connection

  • TLS/SSL enforced for all MongoDB connections
  • Strict certificate validation
  • Connection timeout protection
  • Retry logic for transient failures

Database Schema

Collection: email_analyses

{
  "_id": ObjectId("..."),
  "gmail_id": "18ab123...",           // Unique email ID
  "sender": "sender@example.com",
  "subject": "Email Subject",
  "date": "Mon, 01 Nov 2025 10:00:00",
  "risk_score": 85.5,                 // Final merged score (0-100)
  "risk_label": "HIGH_RISK",          // SAFE, REVIEW, HIGH_RISK
  "ai_risk_score": 90.0,              // AI component score
  "heuristic_risk_score": 75.0,       // Heuristic component score
  "intents": ["phishing", "urgency"], // Detected intents
  "indicators": [...],                // Heuristic indicators
  "ai_summary": "This email...",      // AI-generated summary
  "recommendations": [...],           // Action recommendations
  "processed_at": ISODate("..."),     // Timestamp
  "mock_mode": false,                 // AI mode flag
  "signature": "base64encodedhmac..." // HMAC-SHA256 signature
}

Indexes

  1. gmail_id (Unique) - Prevents duplicate entries
  2. processed_at (Descending) - Efficient time-based queries
  3. risk_label (Ascending) - Fast filtering by risk level
  4. processed_at TTL (Optional) - Auto-delete after 7 days

Setup Instructions

1. MongoDB Atlas Setup

  1. Create a free MongoDB Atlas account at https://www.mongodb.com/cloud/atlas
  2. Create a new cluster (M0 Free Tier works for development)
  3. Configure network access:
    • Add your IP address to IP Access List
    • Or use 0.0.0.0/0 for allow all (development only)
  4. Create database user with read/write permissions
  5. Get your connection string (mongodb+srv://...)

2. Add MongoDB URI to Secrets

Run the secrets encryption script and add MONGO_URI:

python encrypt_secrets.py

When prompted, add:

MONGO_URI=mongodb+srv://username:password@cluster.mongodb.net/phishguard?retryWrites=true&w=majority
SIGNING_SECRET=your-signing-secret-key

3. Install Dependencies

pip install pymongo cryptography

Or from pyproject.toml:

pip install -e .

Usage

In Streamlit App

  1. Load Secrets (Security Tab)

    • Enter your master passphrase
    • Click "Load Secrets"
    • MongoDB integration auto-enables
  2. Scan Emails (Email Analysis Tab)

    • Analyses automatically saved to MongoDB
    • Signature added to each document
  3. View History (History Tab)

    • See all stored analyses
    • Filter by risk level
    • Check signature verification status
    • Expand for detailed view

Programmatic Access

from storage import insert_analysis, load_analyses
from crypto_simple import load_encrypted
from datetime import datetime

# Load secrets
secrets = load_encrypted('secrets.enc', passphrase)
mongo_uri = secrets['MONGO_URI']
signing_secret = secrets['SIGNING_SECRET'].encode('utf-8')

# Insert analysis
analysis_doc = {
    'gmail_id': '18ab123...',
    'sender': 'test@example.com',
    'subject': 'Test Email',
    'risk_score': 85,
    'risk_label': 'HIGH_RISK',
    'processed_at': datetime.now(),
    # ... other fields
}

# Sign and insert
from signing import sign
analysis_doc['signature'] = sign(analysis_doc, signing_secret)

success = insert_analysis(
    analysis_doc,
    app_start_time=datetime.now(),
    mongo_uri=mongo_uri,
    signing_secret=signing_secret
)

# Load analyses
analyses = load_analyses(
    mongo_uri=mongo_uri,
    signing_secret=signing_secret,
    filter_by={'risk_label': 'HIGH_RISK'},
    limit=50
)

# Check signatures
for analysis in analyses:
    if analysis['signature_valid']:
        print(f"✅ {analysis['subject']}")
    else:
        print(f"⚠️ {analysis['subject']} - INVALID SIGNATURE")

API Reference

mongo_client.py

get_mongo_client(mongo_uri: str, force_new: bool = False) -> MongoClient

Get or create MongoDB client with connection pooling.

Parameters:

  • mongo_uri: MongoDB Atlas connection string
  • force_new: Force new connection (default: False)

Returns: MongoClient instance

Raises:

  • ConnectionFailure: Connection failed
  • ValueError: Invalid URI format

close_mongo_client() -> None

Close global MongoDB connection gracefully.

test_connection(mongo_uri: str) -> bool

Test connection without caching client.

storage.py

insert_analysis(analysis_doc, app_start_time, mongo_uri, signing_secret) -> bool

Insert signed analysis into MongoDB.

Parameters:

  • analysis_doc: Analysis document with all fields
  • app_start_time: Session start time for filtering
  • mongo_uri: MongoDB connection URI
  • signing_secret: Secret for signature verification

Returns: True if inserted, False otherwise

load_analyses(mongo_uri, signing_secret, filter_by=None, limit=None, skip=0) -> List[Dict]

Load and verify analyses from MongoDB.

Parameters:

  • mongo_uri: MongoDB connection URI
  • signing_secret: Secret for verification
  • filter_by: MongoDB query filter (optional)
  • limit: Max documents to return (optional)
  • skip: Skip N documents (pagination)

Returns: List of verified analysis documents with signature_valid field

get_analysis_by_gmail_id(gmail_id, mongo_uri, signing_secret) -> Optional[Dict]

Retrieve specific analysis by Gmail ID.

count_analyses(mongo_uri, filter_by=None) -> int

Count analyses matching filter.

delete_old_analyses(mongo_uri, days_old=7) -> int

Delete analyses older than specified days.

Error Handling

Connection Errors

  • Automatic retry for transient failures
  • Clear error messages for configuration issues
  • Graceful fallback when MongoDB unavailable

Signature Verification

  • Invalid signatures logged but not rejected
  • UI clearly indicates verification status
  • Tampered records marked with warnings

Data Validation

  • Required fields checked before insertion
  • Duplicate gmail_id prevented by unique index
  • Type validation for datetime fields

Best Practices

Security

  1. ✅ Always use encrypted secrets (never hardcode)
  2. ✅ Enable MongoDB Atlas IP allowlisting
  3. ✅ Rotate signing secrets periodically
  4. ✅ Use strong database passwords
  5. ✅ Enable MongoDB encryption at rest

Performance

  1. ✅ Use connection pooling (handled automatically)
  2. ✅ Limit query results with pagination
  3. ✅ Use indexes for filtering (auto-created)
  4. ✅ Consider TTL index for automatic cleanup
  5. ✅ Monitor connection pool metrics

Operations

  1. ✅ Backup MongoDB regularly via Atlas
  2. ✅ Monitor signature verification failures
  3. ✅ Set up alerts for connection issues
  4. ✅ Review old data retention policy
  5. ✅ Test restore procedures

Troubleshooting

"MongoDB connection failed"

  • Check MONGO_URI in secrets.enc
  • Verify network access (IP whitelist)
  • Confirm database user credentials
  • Check firewall settings

"Signature verification failed"

  • Ensure SIGNING_SECRET is consistent
  • Check for data tampering
  • Verify secret encoding (bytes vs string)
  • Confirm signing process is correct

"No analyses found"

  • Verify secrets are loaded in Security tab
  • Ensure emails were scanned after loading secrets
  • Check MongoDB Atlas for stored documents
  • Verify collection name matches

Duplicate key errors

  • Gmail IDs are unique by design
  • Re-scanning same emails will be skipped
  • Check logs for insertion attempts

Integration with Other Modules

M1 (Gmail API & Parsing)

  • Receives parsed email data
  • Uses gmail_id as unique identifier

M2 (AI Analysis)

  • Stores AI risk scores and insights
  • Preserves intents and recommendations

M3 (Encryption & Signing)

  • Uses signing functions for integrity
  • Loads MongoDB URI from encrypted secrets
  • Verifies signatures on retrieval

Future Enhancements

  • Real-time analysis streaming
  • Advanced query interface
  • Analysis export (CSV, JSON bulk)
  • Data retention policy UI
  • Analytics dashboard
  • Multi-tenant support
  • Encrypted fields in MongoDB

Version History

  • v0.2.0 (M4) - Initial MongoDB integration
    • Secure storage with signature verification
    • History tab with filtering
    • Connection pooling and error handling

Support

For issues or questions:

  1. Check MongoDB Atlas status
  2. Verify connection string format
  3. Review logs for detailed errors
  4. Ensure all M1-M3 modules working correctly

Module Status: ✅ Production Ready Last Updated: November 2, 2025 Next Module: TBD