A comprehensive market basket analysis project using the Instacart dataset to discover product purchase patterns and association rules through the Apriori algorithm.
This project performs association rule mining on Instacart's e-commerce transaction data to uncover meaningful product relationships and co-purchase patterns. The analysis helps identify which products are frequently bought together, enabling data-driven recommendations and business insights.
- Analyze customer purchase behavior using market basket analysis
- Extract frequent itemsets using the Apriori algorithm
- Generate association rules with support, confidence, and lift metrics
- Provide actionable business insights for e-commerce optimization
The project uses the Instacart Market Basket Analysis dataset, which includes:
- aisles.csv: Aisle information (134 aisles)
- departments.csv: Department information (21 departments)
- products.csv: Product details (49,688 products)
- orders.csv: Order metadata (3.4M+ orders)
- order_products__prior.csv: Prior order products (32M+ records)
- order_products__train.csv: Training order products (1.4M+ records)
The analysis uses a sample of 20,000 orders containing 210,733 product transactions across 21,948 unique products.
- Python 3.x
- pandas: Data manipulation and analysis
- numpy: Numerical computations
- mlxtend: Apriori algorithm and association rule mining
- networkx: Graph-based analysis (optional)
- matplotlib: Data visualization
- Clone the repository:
git clone https://github.com/ParsaHaghighatgoo/instacart-association-rule-mining.git
cd instacart-association-rule-mining- Create a virtual environment (recommended):
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install required dependencies:
pip install -r requirements.txtRun the Jupyter notebook to perform the complete analysis:
jupyter notebook main.ipynb- Data Loading: Load and explore the Instacart dataset
- Data Preprocessing:
- Filter orders with at least 2 products
- Merge product names with transaction data
- Sample 20,000 orders for analysis
- Basket Creation: Transform transactions into market baskets
- Apriori Algorithm: Extract frequent itemsets with different support thresholds
- Association Rules: Generate rules with confidence and lift metrics
- Visualization: Create charts and graphs for insights
- Business Interpretation: Translate findings into actionable recommendations
- min_support = 0.01: 128 frequent itemsets (max size: 2)
- min_support = 0.05: 7 frequent itemsets (max size: 1)
The strongest associations (by lift) include:
- Organic Raspberries → Organic Strawberries (lift ≈ 2.95)
- Strong co-purchase patterns among organic fruits
- High correlation between complementary produce items
- Support: Frequency of itemset occurrence in transactions
- Confidence: Conditional probability of consequent given antecedent
- Lift: Strength of association (lift > 1 indicates positive correlation)
- Product Recommendations: Real-time suggestions based on cart contents
- Cross-Selling & Bundling: Create "Frequently Bought Together" offers
- Targeted Promotions: Offer discounts on associated products
- Website Optimization: Place related products closer together
- Inventory Management: Forecast demand for associated products
instacart-association-rule-mining/
├── main.ipynb # Main analysis notebook
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── .gitignore # Git ignore rules
├── EC-hw3.pdf # Assignment description
├── instacart_association_rule_mining_doc.pdf # Detailed documentation
├── e_commerce_HW3_dataset/ # Dataset folder
│ ├── aisles.csv
│ ├── departments.csv
│ ├── products.csv
│ ├── orders.csv
│ ├── order_products__prior.csv
│ └── order_products__train.csv
├── Docs/ # Analysis documentation
│ ├── Task3_questions.md
│ ├── Task4_Association Rules Analysis.md
│ ├── Task5_BusinessInterpretation.md
│ ├── Task5_Question2.md
│ └── Task5_Question3.md
└── figures/ # Visualization outputs
├── 1.png
├── 2.png
├── 3.png
├── 4.png
└── top_items.png
Detailed analysis and findings are available in the Docs/ folder:
- Task 3: Conceptual questions about support thresholds
- Task 4: Association rules analysis and metrics
- Task 5: Business interpretation and practical applications
The project includes several visualizations in the figures/ directory:
- Top frequent items bar charts
- Association rule network graphs
- Support vs. confidence scatter plots
- Itemset distribution analysis
The analysis demonstrates that:
- Lower support thresholds (0.01) reveal more meaningful multi-item patterns
- Organic produce shows strong co-purchase associations
- Lift metric is crucial for identifying truly meaningful relationships
- Association rules can drive significant business value in e-commerce
This project is part of an academic assignment for educational purposes.
Parsa Haghighatgoo
- Dataset provided by Instacart
- Course: E-Commerce (EC-HW3)
- Institution: CSE Program, Term 9
For questions or feedback, please open an issue in the repository.
Note: This project is for educational purposes and demonstrates the application of association rule mining techniques in e-commerce analytics.