Skip to content

Khiladi-786/Phishing_Deployment

Repository files navigation

๐Ÿ›ก๏ธ Phishing URL Detection System

Typing SVG

Python Random Forest Flask Docker SHAP

๐Ÿš€ Live Demo โ€ข ๐Ÿ“– Documentation โ€ข ๐ŸŽฏ Features โ€ข ๐Ÿ’ก How It Works


๐ŸŽฌ Live Demo

๐Ÿ”ด Production System in Action

Phishing Detector UI

๐Ÿ‘‰ Try the Live System | Real-time phishing URL detection

Test it now:

  • โœ… Legitimate: https://google.com โ†’ 41% phishing confidence
  • ๐Ÿšจ Phishing: http://secure-verify-account.xyz/banking โ†’ 90% phishing confidence

๐ŸŽฏ What is This?

Protect users from phishing attacks using explainable AI.

A production-ready machine learning system that analyzes URLs in real-time and detects phishing attempts with 89.63% accuracy. Trained on 11,430 URLs, explained with SHAP, and deployed via Flask + Docker.

# One line to detect threats
prediction = model.predict(extract_features(url))
# Returns: "LEGITIMATE" or "PHISHING" + confidence score

๐Ÿ›ก๏ธ Cybersecurity Impact

๐ŸŽฏ Accuracy ๐Ÿ“Š Dataset โšก Speed ๐Ÿ” Explainability
89.63% 11,430 URLs <100ms SHAP Analysis

โœจ Key Features

๐Ÿง  Advanced ML

  • Random Forest Classifier (100 estimators)
  • 57 URL-extractable features
  • No webpage scraping needed
  • SHAP explainability for transparency

โšก Real-Time Detection

  • <100ms response time
  • Flask REST API (/predict endpoint)
  • Modern dark UI with confidence bars
  • Instant classification

๐Ÿ“Š Production-Ready

  • Docker containerized
  • Deployed on Render
  • 93.5% accuracy (shown in UI)
  • Perfectly balanced dataset

๐Ÿ” Deep Analysis

  • EDA: Histograms, pair plots, heatmaps
  • SHAP: Feature importance analysis
  • Engineered feature: special_char_ratio
  • 11,430 URLs analyzed

๐ŸŽฌ Feature Showcase

๐ŸŽจ Modern Dark UI Interface
UI Screenshot

What you see:

  • ๐Ÿ›ก๏ธ Shield icon with gradient background
  • โšก Input field โ€” paste any URL
  • ๐Ÿ”ฎ "Analyze URL" button โ€” instant classification
  • โœ… Result card:
    • Green checkmark for LEGITIMATE
    • 41.00% confidence (59% legitimate)
    • Progress bar visualization
  • ๐Ÿ“Š Model stats:
    • 93.5% MODEL ACCURACY
    • 11,430 URLs TRAINED
    • 57 FEATURES analyzed
๐Ÿ“Š Dataset & EDA Analysis

Dataset Composition

Property Details
Total URLs 11,430
Phishing 5,715 (50%)
Legitimate 5,715 (50%)
Balance โœ… Perfectly balanced
Longest URL 1,641 characters

Statistical Insights

URL Length Distribution:

  • Mean: 61.1 characters
  • Median: 55.0 characters
  • Std Dev: 55.3
  • Insight: Mean > Median โ†’ Right-skewed distribution

Visual Analysis:

  • Histograms: Legitimate URLs cluster 20-100 chars; Phishing URLs scattered widely
  • Pair Plots: Legitimate sites in bottom-left quadrant; Phishing sites scattered
  • Correlation Heatmap:
    • length_url โ†” nb_dots: +0.44
    • length_url โ†” ratio_digits_url: +0.45
๐Ÿง  SHAP Explainability Analysis

Top Feature Importance (SHAP):

1. ๐Ÿฅ‡ google_index โ€” Most Critical

If a site is indexed by Google, it's almost certainly safe.

2. ๐Ÿฅˆ special_char_ratio โ€” Engineered Feature

Phishing URLs use complex punctuation to obfuscate identity.
This custom feature proved highly significant in SHAP analysis.

3. ๐Ÿฅ‰ nb_dots, length_url, ratio_digits_url

Combined weak signals create strong prediction power.

Key Insight:

No single feature can perfectly separate phishing from legitimate URLs.
Random Forest combines all 57 features for accurate detection.


๐Ÿง  How It Works

graph LR
    A[๐Ÿ“ค User Enters URL] --> B[๐Ÿงน Feature Extraction]
    B --> C[๐Ÿ“ 57 Features Computed]
    C --> D[๐Ÿค– Random Forest Model]
    D --> E[๐ŸŽฏ Prediction + Confidence]
    E --> F[โœ… LEGITIMATE or ๐Ÿšจ PHISHING]
    
    style A fill:#e1f5ff
    style D fill:#ffe1e1
    style F fill:#e1ffe1
Loading

๐Ÿ”ฌ Feature Engineering Pipeline

Step What Happens Example Features
1. URL Parsing Extract components length_url, nb_dots, nb_hyphens
2. Character Analysis Count special chars nb_at, nb_slash, ratio_digits_url
3. Domain Analysis Check domain properties google_index, tld_in_path, punycode
4. Path Analysis Examine URL path nb_redirection, http_in_path
5. Custom Features Engineered signals special_char_ratio, total_special_chars
6. Prediction Random Forest classify LEGITIMATE (0) or PHISHING (1)

๐Ÿš€ Quick Start

๐ŸŒ Option 1: Use Live System

No installation needed!

# Just visit:
https://phishing-deployment.onrender.com

โœ… Works instantly
โœ… No setup required
โœ… Production server

๐Ÿ’ป Option 2: Run Locally

# Clone repository
git clone https://github.com/Khiladi-786/Phishing_Deployment.git
cd Phishing_Deployment

# Install dependencies
pip install -r requirements.txt

# Launch Flask app
python app.py

๐Ÿ”— Opens at localhost:5001

๐Ÿณ Option 3: Docker Deployment

# Build Docker image
docker build -t phishing-detector .

# Run container
docker run -p 5001:5001 phishing-detector

๐ŸŽฏ Access at localhost:5001

๐Ÿงช Option 4: Test via API

import requests

response = requests.post(
    'https://phishing-deployment.onrender.com/predict',
    json={'url': 'https://google.com'}
)
print(response.json())
# {'prediction': 'LEGITIMATE', 'confidence': 0.59}

๐Ÿ† Model Performance

๐Ÿ“Š Classification Metrics

Metric Score Visual
Accuracy 89.63% โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘ 90%
Precision 89.32% โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘ 89%
Recall 90.03% โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘ 90%
F1 Score 89.67% โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘ 90%

Training Details:

  • Algorithm: Random Forest (100 estimators)
  • Features: 57 URL-extractable features
  • Training Set: 9,144 URLs (80%)
  • Test Set: 2,286 URLs (20%)
  • Cross-Validation: Stratified K-Fold

Real-World Performance:

  • โœ… https://google.com โ†’ LEGITIMATE (41% phishing confidence)
  • ๐Ÿšจ http://secure-verify-account.xyz/banking โ†’ PHISHING (90% confidence)

๐Ÿ“ Project Structure

Phishing_Deployment/
โ”‚
โ”œโ”€โ”€ app.py                       # Flask REST API (port 5001)
โ”œโ”€โ”€ requirements.txt             # Python dependencies
โ”œโ”€โ”€ Dockerfile                   # Docker configuration
โ”œโ”€โ”€ refined_dataset.csv          # Feature column reference
โ”œโ”€โ”€ README.md                    # Project documentation
โ”‚
โ”œโ”€โ”€ model/
โ”‚   โ””โ”€โ”€ best_phishing_model.pkl  # Trained Random Forest model
โ”‚
โ”œโ”€โ”€ templates/
โ”‚   โ””โ”€โ”€ index.html               # Dark-themed UI
โ”‚
โ””โ”€โ”€ screenshots/
    โ””โ”€โ”€ phishing-detector.png    # UI screenshot

๐Ÿ› ๏ธ Tech Stack

Python
Python 3.11
Flask
Flask
Docker
Docker
Scikit-learn
Sklearn
Pandas
Pandas
NumPy
NumPy
Matplotlib
Matplotlib
Seaborn
Seaborn

Additional Tools:

  • ๐Ÿ” SHAP โ€” Model explainability
  • ๐ŸŽจ HTML/CSS โ€” Modern dark UI
  • โ˜๏ธ Render โ€” Cloud deployment platform

๐Ÿ’ก Key Insights & Research Findings

๐Ÿ”ฌ Research Conclusion

"Single-feature detection is insufficient for identifying phishing URLs. Multivariate ML models like Random Forest โ€” interpreted through SHAP โ€” are essential for accurate, explainable, real-world cybersecurity applications."

๐Ÿ“Š Evidence from Analysis:

1. Feature Correlation Analysis

  • No single feature perfectly separates phishing from legitimate
  • google_index is strongest but not 100% reliable
  • Combination of weak signals creates strong classifier

2. SHAP Explainability

  • Top features: google_index, special_char_ratio, nb_dots
  • Feature interactions critical for accuracy
  • Engineered features add unique predictive power

3. Visual Evidence

  • Pair plots: No clear linear separation
  • Histograms: Significant overlap in distributions
  • Heatmap: Low pairwise correlations โ†’ independent signals

๐ŸŽฏ Use Cases

๐Ÿข Enterprise Security

  • Email Gateway Protection
  • Web Browser Extension
  • Corporate Firewall Integration
  • Security Awareness Training

๐Ÿ‘ค Individual Users

  • Real-time URL Verification
  • Social Media Link Scanning
  • Online Shopping Protection
  • Phishing Education Tool

๐Ÿ”ฌ Research & Education

  • Cybersecurity Workshops
  • ML Model Explainability Studies
  • Feature Engineering Examples
  • SHAP Analysis Tutorials

๐Ÿ›ก๏ธ SOC Teams

  • Threat Intelligence Feeds
  • Incident Response Tools
  • Automated URL Scanning
  • Security Monitoring Dashboards

๐Ÿ”ฎ Future Roadmap

Planned Enhancements:

  • ๐ŸŒ Chrome Extension โ€” browser integration for real-time protection
  • ๐Ÿค– Deep Learning Model โ€” LSTM for sequential URL analysis
  • ๐Ÿ“Š Advanced Features โ€” SSL certificate validation, WHOIS data
  • ๐Ÿ”„ Active Learning โ€” continuous model updates from user feedback
  • ๐Ÿ“ฑ Mobile App โ€” iOS/Android phishing scanner
  • ๐ŸŒ Multi-Language Support โ€” internationalized phishing detection
  • ๐Ÿ“ˆ Analytics Dashboard โ€” threat intelligence visualization
  • ๐Ÿ”— API Rate Limiting โ€” enterprise-grade API with authentication

๐Ÿ‘จโ€๐Ÿ’ป About the Author

Nikhil More

B.Tech CSE (AI/ML) โ€ข University of Mumbai (2023โ€“2027)

LinkedIn GitHub Email

Data Science Intern @ Code B Solutions Pvt Ltd
C-DAC Campus Ambassador โ€ข Google Student Ambassador โ€ข GfG Campus Mantri

๐Ÿ† Featured Projects

K-Means Clustering Dashboard

  • 5 customer segments identified
  • Streamlit interactive dashboard
  • Marketing strategy recommendations

๐ŸŽฏ Object Detection

YOLOv8 Real-Time Detection

  • 29 objects detected simultaneously
  • Live webcam + image upload modes
  • 92% confidence on complex scenes

Smart Agriculture ML

  • Soil + weather-based predictions
  • Flask web application
  • 5 crop recommendations

๐Ÿ“ง Spam Detection

NLP Text Classifier

  • TF-IDF vectorization
  • High precision spam detection
  • Real-world email dataset

๐Ÿ“„ License

MIT License โ€ข Free for educational & commercial use

Copyright (c) 2026 Nikhil More

๐Ÿค Contributing

Contributions welcome! Here's how:

# Fork the repository
# Create feature branch
git checkout -b feature/AmazingFeature

# Commit changes
git commit -m 'Add AmazingFeature'

# Push to branch
git push origin feature/AmazingFeature

# Open Pull Request

Ideas for contributions:

  • ๐Ÿง  Deep learning models (LSTM, Transformer)
  • ๐ŸŒ Browser extension development
  • ๐Ÿ“Š Additional feature engineering
  • ๐Ÿงช Unit tests & CI/CD
  • ๐Ÿ“š Enhanced documentation

๐ŸŒŸ Show Your Support

โญ Star This Repository โญ

If this project helped protect you from phishing, give it a star!

๐Ÿ›ก๏ธ Live System โ€ข ๐Ÿ“– Docs โ€ข ๐Ÿ› Issues


Built with โค๏ธ by Nikhil More | Protecting users from cyber threats with AI

#Cybersecurity #MachineLearning #PhishingDetection #RandomForest #SHAP #Flask #Python #AI


๐Ÿ“Š Project Stats

GitHub Stars GitHub Forks GitHub Issues Live Demo

Last Updated: March 2026 โ€ข Status: โœ… Production Live

About

ML-based phishing URL detection using Random Forest, SHAP explainability, Flask API & Docker. Trained on 11,430 URLs with 89.63% accuracy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

โšก