A full-stack movie recommendation system that combines keyword-based similarity and semantic similarity using modern NLP techniques, deployed as a web application using Flask.
This project demonstrates the end-to-end lifecycle of an ML system — from data preprocessing and feature engineering to model inference and web deployment.
-
Hybrid recommendation approach:
- Bag-of-Words (CountVectorizer + Cosine Similarity)
- Semantic similarity using Sentence-BERT
-
Fuzzy matching for robust movie name input
-
Clean and minimal web interface
-
Fast inference using precomputed vectors and embeddings
-
Modular and production-oriented code structure
The system computes movie similarity using a weighted hybrid score:
Final Score = 0.6 × BoW Similarity + 0.4 × Semantic Similarity
This balances:
- Lexical overlap (genres, keywords, cast, crew)
- Contextual meaning (semantic understanding of movie descriptions)
The project uses multiple CSV files derived from the TMDB movie dataset, including:
- Movie metadata
- Cast and crew information
- Keywords and genres
- Poster paths
Since these files are not row-aligned, an ID-based data integration pipeline was implemented to ensure correctness.
⚠️ Raw datasets are excluded due to size and licensing constraints. Seedata/README.mdfor preprocessing details.
movie-recommender/
│
├── data/
│ ├── processed_movies.csv
│ ├── recommendations.json
│ └── README.md
│
├── notebooks/
│ └── data_preprocessing.ipynb
│
├── models/
│ ├── vectors.npz
│ ├── sbert_embeddings.npy
│ ├── stopwords.pkl
│ └── sbert_model/
│
├── templates/
│ └── index.html
│
├── static/
│ └── script.js
│
├── app.py
├── recommender.py
├── requirements.txt
├── README.md
└── .gitignore
- Python
- Pandas, NumPy
- Scikit-learn
- Sentence-Transformers (SBERT)
- PyTorch
- Flask
- HTML, CSS, JavaScript
- Offline preprocessing and feature extraction
- Cached BoW vectors and SBERT embeddings
- Models and data loaded once at server startup
- No recomputation during user requests
This design ensures low latency and scalability for web deployment.
git clone https://github.com/your-username/movie-recommender.git
cd movie-recommenderconda create -n movie-recommender python=3.10
conda activate movie-recommenderpip install -r requirements.txtpython app.pyVisit:
http://127.0.0.1:5000
[
{
"title": "Interstellar",
"poster": "https://image.tmdb.org/t/p/w500/xyz.jpg",
"score": 0.87
}
]- Importance of ID-based data alignment
- Hybrid recommendation system design
- ML model deployment pitfalls and fixes
- Flask + ML integration best practices
- Performance optimization for inference-time systems
- User-based collaborative filtering
- Search auto-completion
- Explanation for recommendations
- Cloud deployment (Docker / Render)
- User feedback loop
Harshavardhan B.Tech CSE, NIT Trichy Interests: NLP, Recommendation Systems, ML Systems, Applied AI
- TMDB for the dataset
- Sentence-Transformers library
- Open-source ML community