Project: Image Recommendation System

Overview

In this project, you will build an Image Recommendation System that suggests images to users based on their preferences. This project applies all the skills you learned in the practical sessions: data parsing, visualization, clustering, classification, and machine learning.

Duration: 3 practical sessions Team Size: 2-3 students Deliverables:

A Jupyter notebook (Name1_Name2_[Name3].ipynb)
A 5-page summary report (PDF)

Learning Objectives

By completing this project, you will:

Automate data collection from web sources
Extract and process image metadata
Apply clustering algorithms to analyze image features
Build user preference profiles
Implement a recommendation algorithm
Visualize data insights effectively
Write comprehensive tests for your system

Project Architecture

The system consists of 7 interconnected tasks:

┌─────────────────────────────────────────────────────────────────┐
│                    IMAGE RECOMMENDATION SYSTEM                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │
│  │ 1. Data      │───▶│ 2. Labeling  │───▶│ 3. Data      │       │
│  │ Collection   │    │ & Annotation │    │ Analysis     │       │
│  └──────────────┘    └──────────────┘    └──────────────┘       │
│        │                    │                   │                │
│        ▼                    ▼                   ▼                │
│  ┌──────────────────────────────────────────────────────┐       │
│  │              JSON Files (Metadata Storage)            │       │
│  └──────────────────────────────────────────────────────┘       │
│        │                    │                   │                │
│        ▼                    ▼                   ▼                │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │
│  │ 4. Data      │    │ 5. Recommend-│    │ 6. Tests     │       │
│  │ Visualization│    │ ation System │    │              │       │
│  └──────────────┘    └──────────────┘    └──────────────┘       │
│                             │                                    │
│                             ▼                                    │
│                    ┌──────────────┐                              │
│                    │ 7. Summary   │                              │
│                    │    Report    │                              │
│                    └──────────────┘                              │
└─────────────────────────────────────────────────────────────────┘

Task 1: Data Collection

Goal

Collect at least 100 open-licensed images with their metadata.

What You Need to Do

Create a folder structure:

project/
├── images/           # Downloaded images
├── data/             # JSON metadata files
└── project.ipynb     # Your notebook

Find image sources (choose one or more):
- Wikimedia Commons - Use SPARQL queries (like in Practical 1)
- Unsplash API - Free API for high-quality images
- Pexels API - Free stock photos
- Flickr API - Creative Commons images
Download images programmatically using the techniques from Practical 1, Exercise 6
Extract and save metadata for each image:
- Image filename
- Image dimensions (width, height)
- File format (.jpg, .png, etc.)
- File size (in KB)
- Source URL
- License information
- EXIF data (if available): camera model, date taken, etc.

Expected Output

images/ folder with 100+ images
data/images_metadata.json containing metadata for all images

Hints

Use PIL to get image dimensions
Use os.path.getsize() to get file size
Use EXIF extraction (see Practical 2, Exercise 2)
Store metadata as a list of dictionaries in JSON format

Task 2: Labeling and Annotation

Goal

Add descriptive labels and computed features to each image.

What You Need to Do

Extract color information using KMeans clustering (Practical 2, Exercise 3):
- Find the 3-5 predominant colors in each image
- Store colors as RGB values and/or color names
Determine image orientation:
- Landscape (width > height)
- Portrait (height > width)
- Square (width ≈ height)
Add category tags (choose an approach):
- Manual: Create a simple interface to tag images
- Automated: Use image source categories/tags
- Hybrid: Start with source tags, allow user refinement
Classify image size:
- Thumbnail: < 500px
- Medium: 500-1500px
- Large: > 1500px

Expected Output

data/images_labels.json with enriched metadata:

{
  "image_001.jpg": {
    "predominant_colors": [[255, 128, 0], [0, 100, 200], [50, 50, 50]],
    "color_names": ["orange", "blue", "gray"],
    "orientation": "landscape",
    "size_category": "medium",
    "tags": ["nature", "sunset", "beach"]
  }
}

Hints

Reuse your KMeans color extraction code from Practical 2
Consider using a color name mapping (RGB → color name)
Store all annotations in a structured JSON file

Task 3: Data Analysis

Goal

Build user preference profiles based on their image selections.

What You Need to Do

Simulate users (create at least 5 users):
- Each user "favorites" 10-20 images
- Users should have different preferences (one likes nature, another likes architecture, etc.)

Build user profiles by analyzing their favorite images:

user_profile = {
    "user_id": "user_001",
    "favorite_colors": ["blue", "green"],      # Most common colors
    "favorite_orientation": "landscape",        # Most common orientation
    "favorite_size": "medium",                  # Most common size
    "favorite_tags": ["nature", "water"],       # Most common tags
    "favorite_images": ["img_01.jpg", ...]      # List of favorited images
}

Analyze patterns across users:
- Which colors are most popular overall?
- Which tags appear most frequently?
- Are there clusters of users with similar preferences?

Expected Output

data/users.json with user profiles
Analysis results showing user preference patterns

Hints

Use pandas for data analysis (groupby, value_counts)
Use Counter from collections to find most common items
Consider using clustering to group similar users

Task 4: Data Visualization

Goal

Create visualizations that reveal insights about your image collection and user preferences.

Required Visualizations

Image Collection Statistics:
- Bar chart: Number of images per orientation
- Bar chart: Number of images per size category
- Pie chart: Distribution of image formats
Color Analysis:
- Display predominant colors as color palettes
- Histogram of color frequencies across all images
User Preferences:
- Bar chart: Favorite colors per user
- Comparison chart: User preferences side-by-side
Tag Analysis:
- Bar chart: Most common tags
- Word cloud (optional): Tag frequency visualization

Expected Output

At least 6 different visualizations in your notebook
All plots should have titles, labels, and legends

Hints

Use matplotlib for all visualizations (Practical 2, Exercise 1)
Save important plots using plt.savefig()
Use subplots to group related visualizations

Task 5: Recommendation System

Goal

Implement a system that recommends images to users based on their preferences.

Choose Your Approach

You must implement at least one of these approaches:

Option A: Content-Based Filtering (Using Classification)

Recommend images similar to what the user has liked before.

# Train a classifier on user's favorites
# Features: color, orientation, size, tags
# Label: Favorite / Not Favorite
# Predict which unseen images the user would like

Use: Decision Tree, Random Forest, or SVM (Practical 3, Exercises 2-3)

Option B: Clustering-Based Recommendation

Group similar images together and recommend from the same cluster.

# Cluster all images based on features
# Find which cluster the user's favorites belong to
# Recommend other images from the same cluster

Use: KMeans (Practical 2, Exercises 3-5)

Option C: Hybrid Approach

Combine both methods for better recommendations.

Implementation Requirements

Input: User ID
Output: List of 5-10 recommended images (not already favorited)
Explanation: Brief reason why each image is recommended

Expected Output

def recommend_images(user_id, n_recommendations=5):
    """
    Recommend images for a user.

    Args:
        user_id: The user to recommend for
        n_recommendations: Number of images to recommend

    Returns:
        List of (image_filename, reason) tuples
    """
    # Your implementation
    pass

Hints

Start with the examples in examples/recommendation.ipynb
Use LabelEncoder to convert categorical features to numbers
Test your recommendations manually - do they make sense?

Task 6: Tests

Goal

Verify that your system works correctly.

Required Tests

Data Validation Tests:
- All images exist in the images folder
- All images have metadata
- Metadata values are valid (no negative dimensions, etc.)
Function Tests:
- Color extraction returns valid RGB values
- User profile generation works correctly
- Recommendation function returns the expected number of results
Recommendation Quality Tests:
- Recommended images are not already in user's favorites
- Recommendations match user preferences (e.g., if user likes blue images, recommendations should include blue images)

Expected Output

def test_data_integrity():
    """Test that all data is valid"""
    # Your tests
    assert len(images) >= 100, "Need at least 100 images"
    assert all_images_have_metadata(), "Missing metadata"

def test_recommendation_system():
    """Test that recommendations work"""
    recommendations = recommend_images("user_001", 5)
    assert len(recommendations) == 5, "Should return 5 recommendations"
    # More tests...

Hints

Use assert statements for simple tests
Print clear pass/fail messages
Test edge cases (empty user profile, new user, etc.)

Task 7: Summary Report

Goal

Write a 5-page report summarizing your project.

Required Sections

Introduction (0.5 page)
- Project goal
- Your approach in brief
Data Collection (0.5 page)
- Image sources and licenses
- Number of images collected
- Metadata stored
Methodology (2.5 pages)
- Labeling approach (how you extracted features)
- User profile construction
- Recommendation algorithm chosen and why
- Include architecture diagram
Results (1 page)
- Key visualizations (2-3 figures)
- Recommendation accuracy/quality
- Interesting findings
Limitations and Future Work (0.25 page)
- What didn't work well?
- How could it be improved?
Conclusion (0.25 page)
- Summary of achievements
- Self-evaluation

Format

5 pages maximum
PDF format
No code in the report (only results and explanations)
Include references/bibliography

Evaluation Criteria

Task	Points	Key Criteria
Data Collection	15%	Automation, 100+ images, complete metadata
Labeling & Annotation	15%	Color extraction, proper categorization
Data Analysis	15%	User profiles, preference analysis
Data Visualization	15%	6+ visualizations, proper formatting
Recommendation System	20%	Working algorithm, reasonable recommendations
Tests	10%	Comprehensive tests, all pass
Summary Report	10%	Clear, complete, well-structured

Submission

Files to Submit

Name1_Name2_[Name3].zip
├── Name1_Name2_[Name3].ipynb    # Your notebook
├── data/
│   ├── images_metadata.json
│   ├── images_labels.json
│   └── users.json
└── summary_report.pdf

Important Notes

DO NOT submit the images folder (too large)
Ensure your notebook runs without errors
Include comments explaining your code
Rename files with your team member names

Getting Started

Start with the template notebook: en/Project/project.ipynb
Review the examples in examples/recommendation.ipynb
Reuse code from your practical sessions
Work incrementally - complete each task before moving to the next

Good luck!

FilesExpand file tree

project.md

Latest commit

History

project.md

File metadata and controls

Project: Image Recommendation System

Overview

Learning Objectives

Project Architecture

Task 1: Data Collection

Goal

What You Need to Do

Expected Output

Hints

Task 2: Labeling and Annotation

Goal

What You Need to Do

Expected Output

Hints

Task 3: Data Analysis

Goal

What You Need to Do

Expected Output

Hints

Task 4: Data Visualization

Goal

Required Visualizations

Expected Output

Hints

Task 5: Recommendation System

Goal

Choose Your Approach

Option A: Content-Based Filtering (Using Classification)

Option B: Clustering-Based Recommendation

Option C: Hybrid Approach

Implementation Requirements

Expected Output

Hints

Task 6: Tests

Goal

Required Tests

Expected Output

Hints

Task 7: Summary Report

Goal

Required Sections

Format

Evaluation Criteria

Submission

Files to Submit

Important Notes

Getting Started