📧 Email Spam Detection

A machine learning classifier to identify spam emails using Natural Language Processing (NLP) techniques and text feature extraction.

📌 Project Overview

Email spam is a persistent problem — unwanted promotional emails, phishing attempts, and malicious content flood inboxes daily. This project builds an Email Spam Detection System using NLP and machine learning to automatically classify emails as spam or legitimate (ham).

Developed as part of my Data Science Internship at Oasis Infobyte.

🎯 Key Highlights

✅ Built an NLP-based text classifier for spam detection
✅ Used TF-IDF vectorization to convert text into numerical features
✅ Trained and compared multiple classification models
✅ Achieved high accuracy on real-world spam email dataset
✅ Clean Python script ready for production use

📊 Dataset

Property	Details
Source	`spam.csv` — real-world email dataset
Task	Binary Classification (Spam vs Ham)
Features	Email text content
Target	spam / ham (legitimate)

🔍 Sample Data

Email Text	Label
"Congratulations! You've won $1000. Click here to claim."	SPAM
"Hey, are we still on for the meeting at 3pm?"	HAM
"URGENT: Your account will be suspended unless you verify now"	SPAM
"Thanks for sending the report, looks great!"	HAM

🧠 Methodology

1. Text Preprocessing

Removed special characters, numbers, and punctuation
Converted all text to lowercase
Removed stop words (common words like "the", "is", "at")
Applied stemming/lemmatization to normalize words

2. Feature Extraction

Used TF-IDF (Term Frequency-Inverse Document Frequency) vectorization
Converted email text into numerical feature vectors
Captured word importance across the entire dataset

3. Model Training

Trained multiple classifiers (Naive Bayes, Logistic Regression, SVM)
Selected best model based on accuracy and precision
Validated with cross-validation to prevent overfitting

🛠️ Tech Stack

Tool	Purpose
Python	Core programming language
Pandas	Data manipulation
Scikit-learn	ML models & TF-IDF vectorization
NLTK / SpaCy	Text preprocessing & NLP
NumPy	Numerical operations

🏆 Model Results

Metric	Score
Accuracy	(Add your score)
Precision	(Add your score)
Recall	(Add your score)
F1 Score	(Add your score)

💡 Run SpamDetection.py to see the full evaluation metrics.

🚀 How to Run

1. Clone the repository

git clone https://github.com/Khiladi-786/Email-Spam-Detection.git
cd Email-Spam-Detection

2. Install dependencies

pip install pandas scikit-learn nltk numpy

3. Run the classifier

python SpamDetection.py

📁 Project Structure

Email-Spam-Detection/
│
├── SpamDetection.py      # Main spam detection script
├── spam.csv              # Email dataset
└── README.md             # Project documentation

💡 Key Insights

Common spam indicators detected by the model:

Words like "free", "win", "urgent", "click here", "congratulations"
Excessive use of ALL CAPS and exclamation marks!!!
Suspicious links and URLs
Poor grammar and spelling errors
Requests for personal information or account verification

How the model works:

Email text is preprocessed (cleaned and normalized)
TF-IDF converts text into numerical features
Classifier predicts spam/ham based on word patterns
High-confidence predictions flag suspicious emails

👨‍💻 About the Author

Nikhil More B.Tech CSE (AI/ML) — University of Mumbai (2023–2027)

Data Science Intern @ Oasis Infobyte | C-DAC Ambassador | Google Student Ambassador

📄 License

This project is licensed under the MIT License.

⭐ If you found this project useful, please give it a star!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📧 Email Spam Detection

📌 Project Overview

🎯 Key Highlights

📊 Dataset

🔍 Sample Data

🧠 Methodology

1. Text Preprocessing

2. Feature Extraction

3. Model Training

🛠️ Tech Stack

🏆 Model Results

🚀 How to Run

1. Clone the repository

2. Install dependencies

3. Run the classifier

📁 Project Structure

💡 Key Insights

👨‍💻 About the Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
SpamDetection.py		SpamDetection.py
spam.csv		spam.csv

Folders and files

Latest commit

History

Repository files navigation

📧 Email Spam Detection

📌 Project Overview

🎯 Key Highlights

📊 Dataset

🔍 Sample Data

🧠 Methodology

1. Text Preprocessing

2. Feature Extraction

3. Model Training

🛠️ Tech Stack

🏆 Model Results

🚀 How to Run

1. Clone the repository

2. Install dependencies

3. Run the classifier

📁 Project Structure

💡 Key Insights

👨‍💻 About the Author

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages