Skip to content

mahmoudnajmeh/manufacturing-analytics-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿญ Manufacturing Analytics Platform

Apache Spark, Delta Lake, and AI-powered analytics for manufacturing data.

Author

Mahmoud Najmeh


๐ŸŽฏ Overview

The Manufacturing Analytics Platform processes factory sensor data using:

  • Apache Spark for distributed data processing
  • Delta Lake for ACID transactions and time travel
  • MLlib for predictive maintenance clustering
  • Structured Streaming for anomaly detection
  • AI assistant with web search integration

The platform analyzes manufacturing telemetry including temperature, pressure, vibration, power consumption, production rates, and defect rates across multiple machines and shifts.

Key Capabilities

  • Real-time Processing: Sub-second anomaly detection from streaming IoT sensors
  • Delta Lake Time Travel: Query any historical version (versionAsOf, timestampAsOf)
  • MLlib Predictive Maintenance: K-means clustering for machine risk assessment
  • Graph Analytics: Machine-shift relationship mapping
  • AI Assistant: Natural language Q&A with web search (Groq + Tavily)
  • ACID Compliance: Schema enforcement, transaction logging, VACUUM support
  • Multiple Outputs: Executive dashboards, warehouse exports, quality alerts

โœจ Features

Core Features

  • ๐Ÿญ Real-time sensor ingestion from CSV and JSON streaming sources

  • ๐Ÿ” 5-stage analytics pipeline: Ingest โ†’ Process โ†’ ML โ†’ Graph โ†’ Output

  • ๐Ÿค– AI-powered assistant with real-time web search (Groq Llama 3.3 + Tavily)

  • ๐Ÿ“Š Delta Lake Time Travel & Versioning:

    • versionAsOf - Query historical snapshots by version number
    • timestampAsOf - Query data as it existed at any point in time
    • history() - Full version log with operation metadata
    • VACUUM - Physical deletion of old Parquet files (GDPR compliance)
    • 4 versions tracked (0โ†’1โ†’2โ†’3) with complete audit trail
  • โš™๏ธ MLlib predictive maintenance: K-means clustering for risk levels (High/Medium/Low)

  • ๐Ÿ“ˆ GraphX-style analysis: Machine-shift relationships and performance mapping

  • ๐Ÿ’พ Multiple storage formats: Delta Lake, Parquet, CSV exports

  • ๐Ÿ“Š Executive dashboards: Shift performance, defect rates, OEE metrics

Technical Features

  • Modular architecture: 12 specialized modules with clear separation of concerns
  • Structured Streaming: 30-second windowed anomaly detection
  • Schema enforcement: Automatic rejection of mismatched data types
  • Comprehensive testing: 15 passing tests with pytest
  • Professional project structure: src/ layout with uv package manager
  • Real-time web search: Tavily + Groq integration for up-to-date answers

๐Ÿ— Architecture

Architecture Overview Diagram

Image

Delta Lake Time Travel & Versioning Flow

Image

AI Assistant Architecture

Image

Complete Pipeline Data Flow

Image

Project Module Structure

Image

Technology Stack

Image

Test Coverage Diagram

Image

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.12 or higher
  • uv for dependency management
  • 4GB RAM minimum (8GB recommended)
  • (Optional) API keys for AI features

Setup Steps

# 1. Clone the repository
git clone https://github.com/mahmoudnajmeh/manufacturing-analytics-spark.git
cd manufacturing_analytics_spark

# 2. Install dependencies using uv
uv sync

# 3. Create data directory and add your CSV file
mkdir -p data
# Copy production_metrics.csv to data/

# 4. (Optional) Set up API keys for AI assistant
export TAVILY_API_KEY="your-tavily-key"
export GROQ_API_KEY="your-groq-key"

# 5. Run the complete pipeline
uv run python -m manufacturing_analytics.main

Data Format

Create data/production_metrics.csv with the following schema:

machine_id,temperature,pressure,vibration,power_consumption,production_rate,defect_rate,timestamp,shift
101,72.5,14.2,0.32,45.6,98.5,2.1,2026-04-21 08:00:00,Morning
102,74.1,14.5,0.28,47.2,97.8,1.9,2026-04-21 08:00:00,Morning

๐Ÿ“ Project Structure

manufacturing_analytics_spark/
โ”œโ”€โ”€ .venv/
โ”œโ”€โ”€ chroma_db/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ production_metrics.csv
โ”‚   โ””โ”€โ”€ sensor_streaming/
โ”œโ”€โ”€ lake/
โ”‚   โ”œโ”€โ”€ manufacturing_delta/
โ”‚   โ”œโ”€โ”€ manufacturing_parquet/
โ”‚   โ”œโ”€โ”€ manufacturing_warehouse/
โ”‚   โ””โ”€โ”€ quality_alerts/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ manufacturing_analytics/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ ai_assistant.py
โ”‚       โ”œโ”€โ”€ config.py
โ”‚       โ”œโ”€โ”€ data_ingestion.py
โ”‚       โ”œโ”€โ”€ delta_time_travel.py
โ”‚       โ”œโ”€โ”€ graph_analytics.py
โ”‚       โ”œโ”€โ”€ main.py
โ”‚       โ”œโ”€โ”€ ml_analytics.py
โ”‚       โ”œโ”€โ”€ output.py
โ”‚       โ”œโ”€โ”€ processing.py
โ”‚       โ”œโ”€โ”€ streaming.py
โ”‚       โ”œโ”€โ”€ storage.py
โ”‚       โ””โ”€โ”€ utils.py
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ test_manufacturing.py
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ .python-version
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ uv.lock

๐Ÿ”ง Commands

Run the Application

# Run full pipeline
uv run python -m manufacturing_analytics.main

# Run with AI features (requires API keys)
export TAVILY_API_KEY="your-key"
export GROQ_API_KEY="your-key"
uv run python -m manufacturing_analytics.main

Testing

# Run all tests
uv run pytest tests/ -v

# Run specific test file
uv run pytest tests/test_manufacturing.py -v

# Run with coverage
uv run pytest tests/ -v --cov=src/manufacturing_analytics --cov-report=html

# Run specific test
uv run pytest tests/test_manufacturing.py::test_version_0_exists -v

Delta Lake Operations

# Query version history from within Python
from delta.tables import DeltaTable
delta_table = DeltaTable.forPath(spark, "lake/manufacturing_delta")
history = delta_table.history().show()

# Time travel to version 0
v0 = spark.read.format("delta").option("versionAsOf", 0).load("lake/manufacturing_delta")

# Run VACUUM (remove old files)
delta_table.vacuum(retentionHours=168)  # 7 days default

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

โšก