Movie Genre Clustering Notebook

This Jupyter Notebook performs a complete pipeline for clustering movies based on their genres and user ratings (on a scale from 0.5 to 5). It reads a CSV of movie ratings, enriches the data by fetching genre information from TMDb, explores genre-based rating trends, and applies K‑Means clustering with a PCA‑based 2D visualization.

Features

Data Loading
- Reads ratings.csv containing columns: Name, Year, Rating (0.5–5).
- Note: ratings.csv is generated by exporting your data from Letterboxd.
Genre Enrichment
- Uses TMDb API to search each movie by title and year.
- Falls back to title‑only search if needed.
- Retrieves genre list for each found movie.
- Logs (with emojis) any movies not found or with missing genre data.
Exploratory Analysis
- Explodes genre lists and computes average rating per genre.
- Displays a horizontal bar chart of average genre ratings.
Data Preparation
- One‑hot encodes genres.
- Combines one‑hot genre columns with the Rating.
- Standardizes all features to equalize scale.
Elbow Method for K Selection
- Computes K‑Means inertia for K = 1 to 10.
- Plots the “elbow” chart to help choose the optimal number of clusters.
Clustering & Visualization
- Applies K‑Means with the chosen K.
- Reduces feature space to two principal components (PCA).
- Displays an interactive Plotly scatter plot, where each point is a movie and color denotes its cluster. Hover tooltips show movie name, year, and rating.
Cluster Interpretation
- Prints summary for each cluster:
  - Number of movies
  - Average rating
  - Top 5 genres by percentage

Requirements

Python 3.7+

Install dependencies:

pip install requests pandas numpy matplotlib scikit-learn plotly tqdm

A valid TMDb API key. Set it in the notebook cell:
```
API_KEY = 'YOUR_TMDB_API_KEY'
```

How to Use

Export your movie ratings from Letterboxd to ratings.csv, then place it in the same folder as this notebook.
Open the notebook in JupyterLab, Jupyter Notebook, or any compatible environment.
Install the required libraries if you haven’t already.
Enter your TMDb API key in the designated cell.
Run the cells in order.
- The enrichment step will take a few minutes depending on dataset size (with a 0.25 s delay per request).
- Inspect the printed logs to see any missing or problematic entries.
Tune the number of clusters (K) after viewing the elbow plot.
Enjoy the interactive cluster visualization and review the printed cluster summaries.

Created with ❤️ for data‑driven movie analysis!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
letterboxd_movie_genre_clustering.ipynb		letterboxd_movie_genre_clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Genre Clustering Notebook

Features

Requirements

How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Movie Genre Clustering Notebook

Features

Requirements

How to Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages