This capstone project showcases the practical application of key data engineering skills by simulating a real-world scenario in which I served as a Junior Data Engineer. I designed and implemented a scalable data analytics platform by working across various technologies in the data engineering lifecycle.
This capstone project simulates the role of a Junior Data Engineer tasked with designing and implementing an end-to-end data analytics platform using multiple data engineering tools and technologies.
It’s the final course in the IBM Data Engineering Professional Certificate, combining all prior learning into one practical project.
✅ Design and build data platforms using OLTP & OLAP architectures
✅ Implement data pipelines with ETL processes using Python and Apache Airflow
✅ Query structured and unstructured data using MySQL, PostgreSQL, and MongoDB
✅ Perform big data analytics and ML predictions using Apache Spark
✅ Visualize insights via dashboards in Google Looker Studio and IBM Cognos Analytics
- 🐍 Python & SQL
- 🐘 PostgreSQL | 🐬 MySQL | 🍃 MongoDB
- 🛠️ Apache Airflow
- 🔍 Apache Spark (MLlib)
- 📊 IBM Cognos Analytics | Google Looker Studio
- 🗃️ OLTP & Data Warehousing
- 🧱 ETL & Data Pipelines
- 🐧 Linux Shell Scripting
- 📂 JSON, CSV, .tar.gz, and data transformations
| Module | Description |
|---|---|
| 📁 1. Data Platform Architecture & OLTP | Designed OLTP schemas & created MySQL databases |
| 🍃 2. NoSQL with MongoDB | Queried JSON documents and used MongoDB indexes |
| 🗄️ 3. Data Warehouse | Built dimensional models & populated warehouse tables |
| 📈 4. Data Analytics & Reporting | Wrote complex SQL queries with ROLLUP, CUBE, and aggregations |
| 🔁 5. ETL & Pipelines | Built ETL flows with Python scripts and Apache Airflow DAGs |
| ⚡ 6. Big Data Analytics with Spark | Trained and deployed ML models using Spark MLlib |
| ✅ 7. Final Submission | Delivered final reports, dashboards, and peer-reviewed projects |
| Tool | Preview |
|---|---|
| Google Looker Studio | ![]() |
| IBM Cognos Analytics | ![]() |
📁 OLTP Database Design
📁 NoSQL Queries & Exports
📁 Data Warehouse Scripts & CSVs
📁 Airflow DAGs & Python Scripts
📁 SparkML Model & Predictions
📁 Dashboards (Google Looker, Cognos)
- 🗃️ Relational & NoSQL Database Design (MySQL, MongoDB)
- 🏗️ Data Warehouse Modeling and Querying (PostgreSQL, IBM Db2)
- 🔄 ETL Pipeline Development (Python, Shell, Apache Airflow)
- 🔥 Big Data Analytics with Apache Spark
- 📊 Data Visualization (Google Looker Studio, IBM Cognos Analytics)
- 🐧 Linux Shell Scripting
- 🧪 SQL queries using
ROLLUP,CUBE,GROUPING SETS, and Materialized Query Tables (MQTs)
- Designed an OLTP schema and created MySQL tables.
- Imported and exported data using SQL and shell scripts.
- Defined primary keys and indexes for optimized access.
- Loaded product catalog data into MongoDB.
- Performed filter queries and aggregation pipelines.
- Exported collections using
mongoexport.
- Created star schema with dimensions and fact tables in PostgreSQL.
- Imported e-commerce sales data.
- Performed OLAP queries with
CUBE,ROLLUP, andGROUPING SETS.
- Wrote analytical SQL queries to uncover trends in sales data.
- Used Materialized Query Tables to improve performance.
- Wrote Python scripts for extract, transform, and load processes.
- Automated the pipeline using Apache Airflow DAGs.
- Processed and cleaned web logs into structured format.
- Used Spark to load and transform product review data.
- Built a machine learning model using Spark MLlib.
- Saved and reloaded the trained model for prediction tasks.
- Built sales dashboards using:
- Google Looker Studio: Interactive charts, filters, KPIs.
- IBM Cognos Analytics: Custom visualizations and report generation.
- Submitted final project artifacts for peer review.
This project helped solidify my knowledge of:
- Building data infrastructure from ground up
- Managing both structured and semi-structured data
- Automating and scaling data workflows
- Communicating data insights through visual tools
✅ Proficiency in end-to-end data engineering workflows
✅ Prepared for real-world junior-level data engineering roles
This project was a culmination of weeks of learning and hands-on practice. I strengthened my data engineering foundations and became confident in building real-world data solutions end-to-end. 🧩💡
- Hiring managers evaluating full-stack data engineers
- Recruiters seeking professionals skilled in data architecture, pipelines, and analytics
- Anyone interested in practical data engineering workflows
- Loyalty & Sales Performance Dashboard
- E-Commerce_Sales_Dashboard_(2020)
- Simple_Dashboard
- Community_Property_Revenue_&_Loyalty_Sales_Dashboard
- Sales & Service Dashboard
If you're interested in my other data projects or collaborations:
🌐 My Portfolio | 💼 LinkedIn | 📂 GitHub Projects












.jpg)
