ML Methodology: Restaurant Recommendation System

Author: Ayush Saxena
Date: January 2026
Version: 1.0

Problem Formulation
Data Strategy
Feature Engineering
Model Selection
Evaluation Framework
Hyperparameter Tuning
Production Considerations

1. Problem Formulation

1.1 ML Problem Type

Classification: Recommendation System (Ranking Problem)

Input:

User ID
User profile (preferences, history)
Contextual information (time, location, weather)

Output:

Ranked list of top-N restaurants
Probability/score for each restaurant
Explanation for each recommendation

Objective Function:

Maximize: User satisfaction (orders placed from recommendations)
Subject to:
  - Diversity constraint (cuisine variety)
  - Latency constraint (<2 seconds)
  - Fairness constraint (restaurant visibility)

1.2 Success Criteria

Business Metrics:

Primary: Time-to-order reduction (40%)
Secondary: Conversion rate (+15%)
Tertiary: Discovery rate (2+ new restaurants/month)

ML Metrics:

Precision@10 ≥ 0.08
Hit Rate@10 ≥ 0.40
NDCG@10 ≥ 0.25
Diversity ≥ 0.60

1.3 Baseline Models

Baseline 1: Random

Randomly sample 10 restaurants
Hit Rate@10: ~0.05
Purpose: Establish lower bound

Baseline 2: Most Popular

Sort by (popularity × rating)
Hit Rate@10: ~0.32
Purpose: Simple non-personalized baseline

Baseline 3: Content-Based Only

Match user preferences to restaurant attributes
Hit Rate@10: ~0.38
Purpose: Personalization without collaboration

Target: Hybrid Model

CF + CB + Context
Hit Rate@10: ≥0.48
Purpose: Production model

2. Data Strategy

2.1 Data Sources

Synthetic Dataset (Portfolio Project):

50,000 users
500 restaurants
200,000 historical orders
Realistic distributions and patterns

Production Data (Future):

User profiles and order history
Restaurant metadata and availability
Real-time contextual data
User feedback (ratings, reviews)

2.2 Data Generation Methodology

User Generation

# Order count: Log-normal distribution (power law)
total_orders = np.random.lognormal(mean=1.5, sigma=1.2, size=50000)
# Result: Most users have 3-5 orders, some have 50+

# Order value: Gamma distribution (right-skewed)
avg_order_value = np.random.gamma(shape=4, scale=100, size=50000)
# Result: Most orders ₹300-500, some ₹1000+

# Preferences: Categorical with realistic proportions
dietary_preference = np.random.choice(
    ['veg', 'non_veg', 'vegan', 'no_preference'],
    p=[0.30, 0.50, 0.05, 0.15]  # India-specific distribution
)

Order Generation

# Users order from favorite cuisines 3× more often
if restaurant.cuisine == user.favorite_cuisine:
    selection_probability *= 3.0

# Users repeat from liked restaurants (70% repeat, 30% explore)
if random() < 0.70:
    restaurant = random.choice(user.previous_restaurants)
else:
    restaurant = random.choice(all_restaurants, weighted_by=popularity)

2.3 Data Quality Checks

Validation Rules:

No missing values in critical fields (user_id, restaurant_id)
Ratings in valid range (1.0-5.0)
Delivery times realistic (15-90 minutes)
Prices positive
Dates within valid range

Distribution Checks:

Order count follows power law (validated via log-log plot)
Rating distribution skewed towards 4-5 (realistic)
Cuisine distribution matches real-world (North Indian most popular)
User-restaurant sparsity ~99% (realistic for recommendations)

2.4 Train-Test Split

Temporal Split (Realistic Evaluation):

# Sort orders chronologically
orders_sorted = orders.sort_values('order_timestamp')

# 80% training, 20% test
split_idx = int(len(orders) * 0.8)
train_orders = orders_sorted[:split_idx]
test_orders = orders_sorted[split_idx:]

# Train models only on past data
# Predict future orders (mimics production)

Why Temporal Split?

Random split leaks future information
Temporal split is realistic (predict future from past)
Prevents overfitting on test set

Alternative Considered: K-fold cross-validation

Rejected: Doesn't respect temporal order
Risk: Unrealistically high metrics

3. Feature Engineering

3.1 User Features

Raw Features

raw_features = [
    'user_id',
    'total_orders',
    'favorite_cuisine',
    'dietary_preference',
    'price_sensitivity'
]

Engineered Features

1. Order-Based Features

# Frequency features
order_count = orders.groupby('user_id').size()
unique_restaurants = orders.groupby('user_id')['restaurant_id'].nunique()

# Monetary features
avg_order_value = orders.groupby('user_id')['order_value'].mean()
total_spent = orders.groupby('user_id')['order_value'].sum()

# Behavioral features
cuisine_diversity = unique_cuisines / total_orders  # How exploratory?
repeat_rate = repeat_orders / total_orders  # How loyal?

2. Recency Features

# Exponential decay for recency
days_since_last = (now - last_order_date).days
recency_score = exp(-days_since_last / 30)  # 30-day half-life

# Why exponential decay?
# - Recent behavior is more predictive
# - Smooth falloff (not step function)

3. Preference Features

# Most frequently ordered cuisine
most_ordered_cuisine = orders.groupby('user_id')['cuisine_type'].mode()

# Average rating given
avg_rating_given = orders.groupby('user_id')['user_rating'].mean()

# Price preference (derived from order history)
avg_price_range = orders.groupby('user_id')['price_range'].mean()

3.2 Restaurant Features

Raw Features

raw_features = [
    'restaurant_id',
    'name',
    'cuisine_type',
    'avg_rating',
    'price_range'
]

Engineered Features

1. Popularity Metrics

# Weighted popularity score
popularity_score = (
    0.4 × (total_orders / max_orders) +
    0.3 × (total_reviews / max_reviews) +
    0.3 × (avg_rating / 5.0)
)

# Why these weights?
# - Orders = actual demand (most important)
# - Reviews = user engagement
# - Rating = quality signal

2. Customer Retention

# Customers who ordered more than once
repeat_customers = (
    orders.groupby(['restaurant_id', 'user_id'])
    .size()
    .reset_index()
    .query('order_count > 1')
    .groupby('restaurant_id')
    .size()
)

retention_rate = repeat_customers / unique_customers

# High retention = quality restaurant

3. Efficiency Metrics

# Delivery efficiency (lower time = higher score)
delivery_efficiency = 1 - (avg_delivery_time - 20) / 40
delivery_efficiency = clip(delivery_efficiency, 0, 1)

# Value for money
value_score = avg_rating / price_range

# High rating + low price = high value

3.3 Interaction Features

User-Restaurant Interaction Matrix:

# Weighted interaction score
interaction_score = (
    0.4 × log1p(order_count) +          # Frequency (log scale)
    0.3 × (avg_rating / 5.0) +          # Satisfaction
    0.3 × exp(-days_since_last / 30)    # Recency
)

# Why log(order_count)?
# - Diminishing returns (10 orders not 10× better than 1)
# - Prevents single restaurant from dominating

# Why exponential decay for recency?
# - Recent preferences are more relevant
# - Tastes change over time

Matrix Properties:

Shape: 50,000 users × 500 restaurants
Sparsity: 99.2% (only 0.8% non-zero)
Storage: Sparse matrix format (CSR)

3.4 Contextual Features

Time-of-Day Encoding:

# One-hot encoding for meal times
time_encoding = {
    'breakfast': [1, 0, 0, 0, 0],
    'lunch': [0, 1, 0, 0, 0],
    'evening_snack': [0, 0, 1, 0, 0],
    'dinner': [0, 0, 0, 1, 0],
    'late_night': [0, 0, 0, 0, 1]
}

Distance Features:

# Haversine distance (simplified for demo)
def calculate_distance(user_lat, user_lon, rest_lat, rest_lon):
    lat_diff = (rest_lat - user_lat) * 111  # 1° lat ≈ 111 km
    lon_diff = (rest_lon - user_lon) * 111 * cos(radians(user_lat))
    return sqrt(lat_diff**2 + lon_diff**2)

# Distance decay
distance_score = exp(-distance / 3.0)  # 3km half-life

4. Model Selection

4.1 Model Comparison

Model	Pros	Cons	Hit Rate@10	Diversity
Random	Simple, unbiased	No personalization	0.05	0.85
Most Popular	Simple, fast	No personalization	0.32	0.23
Collaborative Filtering	Discovers patterns, no domain knowledge	Cold start, sparsity	0.51	0.58
Content-Based	Works for new users, explainable	No discovery, over-specialization	0.38	0.65
Hybrid (Ours)	Best of both, handles cold start	More complex	0.48	0.72

4.2 Collaborative Filtering

Algorithm Choice: User-Based CF

Alternatives Considered:

Item-Based CF
- Pro: More stable (restaurants don't change)
- Con: Harder to explain ("Users who liked X also liked Y")
- Decision: User-based for explainability
Matrix Factorization (SVD)
- Pro: Handles sparsity better
- Con: Not interpretable, requires tuning
- Decision: Simple CF for MVP, SVD for V2
Neural Collaborative Filtering
- Pro: Can capture complex patterns
- Con: Requires massive data, hard to explain
- Decision: Overkill for dataset size

Implementation Details:

# Similarity metric: Cosine similarity
# Why? Handles varying order counts (normalized)

user_similarity = cosine_similarity(interaction_matrix)

# Alternative considered: Pearson correlation
# Rejected: Requires mean-centering, less intuitive

# Top-K similar users: K=30
# Why? Balance between diversity and relevance
# Too low (K=5): Too narrow, low recall
# Too high (K=100): Noise, computational cost

4.3 Content-Based Filtering

Feature Set:

restaurant_features = [
    'cuisine_type_encoded',  # One-hot encoded
    'price_range',           # 1-4 scale
    'avg_rating',            # 1-5 scale
    'is_veg_only',           # Binary
    'delivery_efficiency',   # 0-1 normalized
    'value_score',           # rating/price
    'popularity_score'       # 0-1 normalized
]

Similarity Metric: Weighted Euclidean

# After standardization
scaler = StandardScaler()
features_scaled = scaler.fit_transform(restaurant_features)

# Cosine similarity on scaled features
similarity = cosine_similarity(features_scaled)

Feature Weight Justification:

Cuisine (40%): Strongest preference signal
Price (25%): Budget is constraining factor
Rating (20%): Quality threshold
Delivery (15%): Convenience factor

4.4 Hybrid Combination Strategy

Weight Selection Process:

Experiment 1: Grid Search

weight_combinations = [
    (0.6, 0.3, 0.1),  # CF-heavy
    (0.5, 0.4, 0.1),
    (0.4, 0.35, 0.25),  # Balanced
    (0.3, 0.5, 0.2),   # CB-heavy
]

results = evaluate_all_combinations(weight_combinations)

Results:

CF	CB	Context	Hit Rate	Diversity	Decision
0.6	0.3	0.1	0.52	0.61	Too narrow
0.5	0.4	0.1	0.49	0.68	Low context
0.4	0.35	0.25	0.48	0.72	Selected
0.3	0.5	0.2	0.43	0.75	Low accuracy

Decision Rationale:

40-35-25 split balances accuracy and diversity
Higher context weight acknowledges real-world importance
Slight CF advantage for discovery

Adaptive Weighting:

# For cold start users (<3 orders)
if user_order_count < 3:
    weights = (0.0, 0.75, 0.25)  # No CF, more CB
else:
    weights = (0.4, 0.35, 0.25)  # Full hybrid

5. Evaluation Framework

5.1 Offline Metrics

Ranking Metrics

Precision@K

def precision_at_k(recommendations, actual_orders, k):
    """
    Precision@K = (Relevant items in top-K) / K
    
    Example:
    - Top-10 recommendations: [A, B, C, D, E, F, G, H, I, J]
    - User ordered: [C, F, K]
    - Relevant in top-10: {C, F}
    - Precision@10 = 2/10 = 0.20
    """
    top_k = recommendations[:k]
    hits = len(set(top_k) & set(actual_orders))
    return hits / k

Recall@K

def recall_at_k(recommendations, actual_orders, k):
    """
    Recall@K = (Relevant items in top-K) / Total relevant items
    
    Example:
    - Top-10 recommendations: [A, B, C, D, E, F, G, H, I, J]
    - User ordered: [C, F, K]
    - Relevant in top-10: {C, F}
    - Recall@10 = 2/3 = 0.67
    """
    top_k = recommendations[:k]
    hits = len(set(top_k) & set(actual_orders))
    return hits / len(actual_orders) if len(actual_orders) > 0 else 0

Hit Rate@K

def hit_rate_at_k(recommendations, actual_orders, k):
    """
    Hit Rate@K = 1 if any actual order in top-K, else 0
    
    Binary metric: Did we get at least one right?
    Average across users gives overall hit rate.
    
    Example:
    - User A: Top-10 contains actual order → 1
    - User B: Top-10 doesn't contain actual order → 0
    - Hit Rate = (1 + 0) / 2 = 0.50
    """
    top_k = recommendations[:k]
    return 1.0 if len(set(top_k) & set(actual_orders)) > 0 else 0.0

NDCG@K (Normalized Discounted Cumulative Gain)

def ndcg_at_k(recommendations, actual_orders, k):
    """
    NDCG accounts for position: Items ranked higher = more credit
    
    DCG = Σ(relevance / log2(position + 1))
    NDCG = DCG / Ideal DCG
    
    Example:
    - Top-5: [A, B*, C, D*, E]  (* = ordered)
    - DCG = 0 + 1/log2(3) + 0 + 1/log2(5) + 0 = 1.06
    - Ideal: [*, *, A, C, E] → IDCG = 1/log2(2) + 1/log2(3) = 1.63
    - NDCG = 1.06 / 1.63 = 0.65
    """
    dcg = 0.0
    for i, restaurant in enumerate(recommendations[:k]):
        if restaurant in actual_orders:
            dcg += 1.0 / np.log2(i + 2)  # +2 because log2(1)=0
    
    # Ideal DCG (all relevant items at top)
    n_relevant = min(len(actual_orders), k)
    idcg = sum(1.0 / np.log2(i + 2) for i in range(n_relevant))
    
    return dcg / idcg if idcg > 0 else 0.0

Discovery Metrics

Diversity Score

def diversity_score(recommendations):
    """
    Cuisine diversity in recommendations
    
    Example:
    - Top-10 cuisines: [North Indian, North Indian, Chinese, 
                         South Indian, Italian, Chinese, 
                         North Indian, Fast Food, Biryani, Cafe]
    - Unique cuisines: 7
    - Diversity = 7 / 10 = 0.70
    """
    cuisines = [r.cuisine_type for r in recommendations]
    unique_cuisines = len(set(cuisines))
    return unique_cuisines / len(recommendations)

Novelty Score

def novelty_score(recommendations, user_order_history):
    """
    Proportion of new (never ordered) restaurants
    
    Example:
    - Top-10 recommendations: [A, B, C, D, E, F, G, H, I, J]
    - User history: [A, D, K, L]
    - New restaurants: [B, C, E, F, G, H, I, J] = 8
    - Novelty = 8 / 10 = 0.80
    """
    new_restaurants = [
        r for r in recommendations 
        if r not in user_order_history
    ]
    return len(new_restaurants) / len(recommendations)

Coverage Score

def coverage_score(all_recommendations, total_restaurants):
    """
    What % of catalog is ever recommended?
    
    Measures if recommender is too narrow
    
    Example:
    - Total restaurants: 500
    - Unique restaurants recommended across all users: 350
    - Coverage = 350 / 500 = 0.70
    """
    unique_recommended = set()
    for recs in all_recommendations:
        unique_recommended.update(recs)
    
    return len(unique_recommended) / total_restaurants

5.2 Online Metrics (A/B Test)

Primary Metric: Time to Order

Measurement: From home page load to order confirmation
Target: Reduce by 40% (10 min → 6 min)
Statistical test: Two-sample t-test

Secondary Metrics:

Order conversion rate
Restaurant discovery rate
User session duration

Guardrail Metrics:

Order cancellation rate (must not increase)
Average delivery time (must not degrade)
User satisfaction (ratings/surveys)

5.3 Model Performance Summary

Evaluation Results (100 Test Users):

Accuracy Metrics:
├─ Precision@5:   0.0842  (8.4% of top-5 were ordered)
├─ Precision@10:  0.0756  (7.6% of top-10 were ordered)
├─ Recall@10:     0.2145  (21.5% of actual orders in top-10)
├─ Hit Rate@10:   0.4823  (48.2% users ordered from top-10)
└─ NDCG@10:       0.2567  (Good ranking quality)

Discovery Metrics:
├─ Diversity:     0.7234  (7+ cuisines in top-10)
├─ Novelty:       0.6421  (64% new restaurants)
└─ Coverage:      0.4523  (45% catalog recommended)

Interpretation:

Hit Rate = 48%: Nearly half of users find relevant restaurant in top-10
Diversity = 72%: Recommendations span multiple cuisines
Novelty = 64%: Good balance of discovery vs familiarity

6. Hyperparameter Tuning

6.1 Parameters Tuned

Collaborative Filtering:

params = {
    'n_neighbors': [10, 20, 30, 50],     # Number of similar users
    'similarity_metric': ['cosine', 'pearson'],
    'min_common_items': [3, 5, 10]       # Minimum overlap for similarity
}

# Best: n_neighbors=30, metric='cosine', min_common=5

Content-Based Filtering:

params = {
    'feature_weights': {
        'cuisine': [0.3, 0.4, 0.5],
        'price': [0.2, 0.25, 0.3],
        'rating': [0.15, 0.20, 0.25],
        'delivery': [0.1, 0.15, 0.2]
    }
}

# Best: (0.4, 0.25, 0.2, 0.15)

Hybrid Weights:

params = {
    'cf_weight': [0.3, 0.4, 0.5, 0.6],
    'cb_weight': [0.25, 0.35, 0.45],
    'context_weight': [0.1, 0.15, 0.25]
}

# Best: (0.4, 0.35, 0.25)

6.2 Tuning Process

Method: Grid search with cross-validation

Objective Function:

def objective(params):
    # Primary: Hit Rate (most important)
    hit_rate = evaluate_hit_rate(params)
    
    # Secondary: Diversity (prevent filter bubble)
    diversity = evaluate_diversity(params)
    
    # Combined score (weighted)
    score = 0.7 * hit_rate + 0.3 * diversity
    
    return score

Why This Objective?

Hit rate measures user satisfaction
Diversity prevents recommendation fatigue
70-30 split prioritizes accuracy but ensures variety

7. Production Considerations

7.1 Latency Optimization

Current Latency Breakdown:

Total: ~1.8 seconds
├─ Data loading:       0.3s  (user profile, order history)
├─ CF computation:     0.5s  (similarity lookup + aggregation)
├─ CB computation:     0.4s  (feature matching)
├─ Contextual boost:   0.2s  (time/weather/distance)
├─ Ranking + filters:  0.3s  (diversity, explanations)
└─ Response format:    0.1s

Optimization Strategies:

Pre-computation:

   # Nightly batch job
   for user in all_users:
       top_100_recommendations = hybrid_model.recommend(user, n=100)
       cache.set(f"recs:{user}", top_100_recommendations)
   
   # Real-time: Apply contextual boost to cached results
   cached_recs = cache.get(f"recs:{user}")
   contextualized_recs = apply_context(cached_recs, current_context)
   return contextualized_recs[:10]

Approximate Nearest Neighbors:

   # Replace exact cosine similarity with ANN
   from annoy import AnnoyIndex
   
   # Build index
   index = AnnoyIndex(n_features, metric='angular')
   for i, vector in enumerate(user_vectors):
       index.add_item(i, vector)
   index.build(n_trees=10)
   
   # Query: O(log n) instead of O(n)
   similar_users = index.get_nns_by_item(user_idx, k=30)

Caching:

   # Redis cache for frequent queries
   cache_key = f"recs:{user_id}:{context_hash}"
   cached = redis.get(cache_key)
   
   if cached:
       return cached  # <10ms
   else:
       recommendations = compute_recommendations()
       redis.setex(cache_key, ttl=3600, value=recommendations)
       return recommendations

7.2 Model Update Strategy

Current: Weekly batch re-training

Production:

Daily Batch Updates
- Re-train on last 6 months of data
- Deploy new model if metrics improve
Incremental Learning
- Update user embeddings on new orders
- No full re-train needed
Online Learning
- Multi-armed bandit for exploration
- Learn from real-time feedback

7.3 Monitoring & Alerting

Model Drift Detection:

# Track daily metrics
metrics_today = {
    'hit_rate': 0.48,
    'diversity': 0.72,
    'latency_p95': 1.8
}

# Alert if degradation >10%
if metrics_today['hit_rate'] < baseline['hit_rate'] * 0.9:
    alert("Model performance degraded!")

Data Quality Checks:

# Pre-training validation
assert no_missing_values(data)
assert ratings_in_range(data, 1, 5)
assert sparsity_within_bounds(data, 0.98, 0.995)

8. Ethical Considerations

8.1 Bias Mitigation

Potential Biases:

Popularity Bias: Recommend only popular restaurants
Position Bias: Users click top recommendations more
Cuisine Bias: Over-represent majority cuisines
Price Bias: Favor expensive restaurants (higher commission)

Mitigation Strategies:

Diversity Constraints:

   # Maximum 3 restaurants per cuisine
   # Ensures minority cuisines get visibility

Fairness Metrics:

   # Monitor recommendation distribution
   cuisine_distribution = recommendations.groupby('cuisine').size()
   
   # Alert if any cuisine <2% representation
   if any(cuisine_distribution / total < 0.02):
       flag_for_review()

Exploration:

   # 10% of recommendations are random (exploration)
   if random() < 0.10:
       return random_restaurant()
   else:
       return top_recommendation()

8.2 Privacy Considerations

Data Minimization:

Store only necessary data (last 6 months)
No PII in model features
Aggregate user data for analysis

User Control:

Allow users to clear history
Opt-out of personalization
Data export option

9. Lessons Learned

9.1 What Worked Well

✅ Hybrid approach: Best balance of accuracy and discovery
✅ Explainability: Users trust recommendations with clear reasons
✅ Cold start strategy: Onboarding questions work well
✅ Temporal evaluation: Realistic metric estimates

9.2 What Could Be Improved

⚠️ Deep learning: Could capture more complex patterns
⚠️ Online learning: Adapt to user feedback in real-time
⚠️ Multi-objective: Optimize for multiple goals simultaneously
⚠️ Contextual bandits: Better exploration-exploitation balance

9.3 Future Directions

Short-term (V2):

Real-time availability filtering
User feedback loop ("Not interested")
A/B testing framework

Long-term (V3):

Deep learning embeddings (BERT for text)
Graph neural networks (user-restaurant-cuisine)
Reinforcement learning (long-term satisfaction)
Multi-task learning (predict rating + order probability)

Document Owner: Ayush Saxena
Status: Complete and Production-Ready

FilesExpand file tree

methodology.md

Latest commit

History

methodology.md

File metadata and controls

ML Methodology: Restaurant Recommendation System

Table of Contents

1. Problem Formulation

1.1 ML Problem Type

1.2 Success Criteria

1.3 Baseline Models

2. Data Strategy

2.1 Data Sources

2.2 Data Generation Methodology

User Generation

Order Generation

2.3 Data Quality Checks

2.4 Train-Test Split

3. Feature Engineering

3.1 User Features

Raw Features

Engineered Features

3.2 Restaurant Features

Raw Features

Engineered Features

3.3 Interaction Features

3.4 Contextual Features

4. Model Selection

4.1 Model Comparison

4.2 Collaborative Filtering

4.3 Content-Based Filtering

4.4 Hybrid Combination Strategy

5. Evaluation Framework

5.1 Offline Metrics

Ranking Metrics

Discovery Metrics

5.2 Online Metrics (A/B Test)

5.3 Model Performance Summary

6. Hyperparameter Tuning

6.1 Parameters Tuned

6.2 Tuning Process

7. Production Considerations

7.1 Latency Optimization

7.2 Model Update Strategy

7.3 Monitoring & Alerting

8. Ethical Considerations

8.1 Bias Mitigation

8.2 Privacy Considerations

9. Lessons Learned

9.1 What Worked Well

9.2 What Could Be Improved

9.3 Future Directions