๐ Try the Live System | Real-time phishing URL detection
Test it now:
- โ
Legitimate:
https://google.comโ 41% phishing confidence - ๐จ Phishing:
http://secure-verify-account.xyz/bankingโ 90% phishing confidence
Protect users from phishing attacks using explainable AI.
A production-ready machine learning system that analyzes URLs in real-time and detects phishing attempts with 89.63% accuracy. Trained on 11,430 URLs, explained with SHAP, and deployed via Flask + Docker.
# One line to detect threats
prediction = model.predict(extract_features(url))
# Returns: "LEGITIMATE" or "PHISHING" + confidence score| ๐ฏ Accuracy | ๐ Dataset | โก Speed | ๐ Explainability |
|---|---|---|---|
| 89.63% | 11,430 URLs | <100ms | SHAP Analysis |
|
|
|
|
๐จ Modern Dark UI Interface
What you see:
- ๐ก๏ธ Shield icon with gradient background
- โก Input field โ paste any URL
- ๐ฎ "Analyze URL" button โ instant classification
- โ
Result card:
- Green checkmark for LEGITIMATE
- 41.00% confidence (59% legitimate)
- Progress bar visualization
- ๐ Model stats:
- 93.5% MODEL ACCURACY
- 11,430 URLs TRAINED
- 57 FEATURES analyzed
๐ Dataset & EDA Analysis
| Property | Details |
|---|---|
| Total URLs | 11,430 |
| Phishing | 5,715 (50%) |
| Legitimate | 5,715 (50%) |
| Balance | โ Perfectly balanced |
| Longest URL | 1,641 characters |
URL Length Distribution:
- Mean: 61.1 characters
- Median: 55.0 characters
- Std Dev: 55.3
- Insight: Mean > Median โ Right-skewed distribution
Visual Analysis:
- Histograms: Legitimate URLs cluster 20-100 chars; Phishing URLs scattered widely
- Pair Plots: Legitimate sites in bottom-left quadrant; Phishing sites scattered
- Correlation Heatmap:
length_urlโnb_dots: +0.44length_urlโratio_digits_url: +0.45
๐ง SHAP Explainability Analysis
1. ๐ฅ google_index โ Most Critical
If a site is indexed by Google, it's almost certainly safe.
2. ๐ฅ special_char_ratio โ Engineered Feature
Phishing URLs use complex punctuation to obfuscate identity.
This custom feature proved highly significant in SHAP analysis.
3. ๐ฅ nb_dots, length_url, ratio_digits_url
Combined weak signals create strong prediction power.
Key Insight:
No single feature can perfectly separate phishing from legitimate URLs.
Random Forest combines all 57 features for accurate detection.
graph LR
A[๐ค User Enters URL] --> B[๐งน Feature Extraction]
B --> C[๐ 57 Features Computed]
C --> D[๐ค Random Forest Model]
D --> E[๐ฏ Prediction + Confidence]
E --> F[โ
LEGITIMATE or ๐จ PHISHING]
style A fill:#e1f5ff
style D fill:#ffe1e1
style F fill:#e1ffe1
| Step | What Happens | Example Features |
|---|---|---|
| 1. URL Parsing | Extract components | length_url, nb_dots, nb_hyphens |
| 2. Character Analysis | Count special chars | nb_at, nb_slash, ratio_digits_url |
| 3. Domain Analysis | Check domain properties | google_index, tld_in_path, punycode |
| 4. Path Analysis | Examine URL path | nb_redirection, http_in_path |
| 5. Custom Features | Engineered signals | special_char_ratio, total_special_chars |
| 6. Prediction | Random Forest classify | LEGITIMATE (0) or PHISHING (1) |
|
No installation needed! # Just visit:
https://phishing-deployment.onrender.comโ
Works instantly |
# Clone repository
git clone https://github.com/Khiladi-786/Phishing_Deployment.git
cd Phishing_Deployment
# Install dependencies
pip install -r requirements.txt
# Launch Flask app
python app.py๐ Opens at |
# Build Docker image
docker build -t phishing-detector .
# Run container
docker run -p 5001:5001 phishing-detector๐ฏ Access at |
import requests
response = requests.post(
'https://phishing-deployment.onrender.com/predict',
json={'url': 'https://google.com'}
)
print(response.json())
# {'prediction': 'LEGITIMATE', 'confidence': 0.59} |
| Metric | Score | Visual |
|---|---|---|
| Accuracy | 89.63% | โโโโโโโโโโโโโโโโโโโโโโ 90% |
| Precision | 89.32% | โโโโโโโโโโโโโโโโโโโโโโ 89% |
| Recall | 90.03% | โโโโโโโโโโโโโโโโโโโโโโ 90% |
| F1 Score | 89.67% | โโโโโโโโโโโโโโโโโโโโโโ 90% |
Training Details:
- Algorithm: Random Forest (100 estimators)
- Features: 57 URL-extractable features
- Training Set: 9,144 URLs (80%)
- Test Set: 2,286 URLs (20%)
- Cross-Validation: Stratified K-Fold
Real-World Performance:
- โ
https://google.comโ LEGITIMATE (41% phishing confidence) - ๐จ
http://secure-verify-account.xyz/bankingโ PHISHING (90% confidence)
Phishing_Deployment/
โ
โโโ app.py # Flask REST API (port 5001)
โโโ requirements.txt # Python dependencies
โโโ Dockerfile # Docker configuration
โโโ refined_dataset.csv # Feature column reference
โโโ README.md # Project documentation
โ
โโโ model/
โ โโโ best_phishing_model.pkl # Trained Random Forest model
โ
โโโ templates/
โ โโโ index.html # Dark-themed UI
โ
โโโ screenshots/
โโโ phishing-detector.png # UI screenshot
|
Python 3.11 |
Flask |
Docker |
Sklearn |
|
Pandas |
NumPy |
Matplotlib |
Seaborn |
Additional Tools:
- ๐ SHAP โ Model explainability
- ๐จ HTML/CSS โ Modern dark UI
- โ๏ธ Render โ Cloud deployment platform
"Single-feature detection is insufficient for identifying phishing URLs. Multivariate ML models like Random Forest โ interpreted through SHAP โ are essential for accurate, explainable, real-world cybersecurity applications."
1. Feature Correlation Analysis
- No single feature perfectly separates phishing from legitimate
google_indexis strongest but not 100% reliable- Combination of weak signals creates strong classifier
2. SHAP Explainability
- Top features:
google_index,special_char_ratio,nb_dots - Feature interactions critical for accuracy
- Engineered features add unique predictive power
3. Visual Evidence
- Pair plots: No clear linear separation
- Histograms: Significant overlap in distributions
- Heatmap: Low pairwise correlations โ independent signals
|
|
|
|
Planned Enhancements:
- ๐ Chrome Extension โ browser integration for real-time protection
- ๐ค Deep Learning Model โ LSTM for sequential URL analysis
- ๐ Advanced Features โ SSL certificate validation, WHOIS data
- ๐ Active Learning โ continuous model updates from user feedback
- ๐ฑ Mobile App โ iOS/Android phishing scanner
- ๐ Multi-Language Support โ internationalized phishing detection
- ๐ Analytics Dashboard โ threat intelligence visualization
- ๐ API Rate Limiting โ enterprise-grade API with authentication
B.Tech CSE (AI/ML) โข University of Mumbai (2023โ2027)
Data Science Intern @ Code B Solutions Pvt Ltd
C-DAC Campus Ambassador โข Google Student Ambassador โข GfG Campus Mantri
|
K-Means Clustering Dashboard
|
๐ฏ Object DetectionYOLOv8 Real-Time Detection
|
๐พ Crop RecommendationSmart Agriculture ML
|
๐ง Spam DetectionNLP Text Classifier
|
MIT License โข Free for educational & commercial use
Copyright (c) 2026 Nikhil More
Contributions welcome! Here's how:
# Fork the repository
# Create feature branch
git checkout -b feature/AmazingFeature
# Commit changes
git commit -m 'Add AmazingFeature'
# Push to branch
git push origin feature/AmazingFeature
# Open Pull RequestIdeas for contributions:
- ๐ง Deep learning models (LSTM, Transformer)
- ๐ Browser extension development
- ๐ Additional feature engineering
- ๐งช Unit tests & CI/CD
- ๐ Enhanced documentation
If this project helped protect you from phishing, give it a star!
๐ก๏ธ Live System โข ๐ Docs โข ๐ Issues
Built with โค๏ธ by Nikhil More | Protecting users from cyber threats with AI
#Cybersecurity #MachineLearning #PhishingDetection #RandomForest #SHAP #Flask #Python #AI