Data Engineer with 2+ years of production experience designing and operating cloud-native data infrastructure on GCP. Currently at Accenture, building real-time event-driven pipelines and a BigQuery warehouse serving analytical workloads at scale. GCP Professional Data Engineer certified.
Data Engineer — Accenture Sept 2024 – Present · Pune, India
- Redesigned BigQuery data warehouse partitioning and clustering strategy; reduced average query latency by 60% and monthly compute costs by 35%
- Built real-time event-driven ingestion pipelines using Cloud Run, EventArc, and Apache Kafka
- Orchestrated multi-sink ETL workflows with Cloud Composer (Airflow), delivering to Elasticsearch, MySQL, PostgreSQL, and Kafka topics
Data Engineer — Walkover Jan 2024 – Sept 2024
- Designed backend data infrastructure for a workflow automation platform serving 10,000+ concurrent users
- Architected fault-tolerant RabbitMQ pipelines, achieving 50% throughput improvement over the prior architecture
- Optimized database access patterns with batched reads and writes; 30% faster retrieval under peak load
| Project | Description |
|---|---|
| VLR Analytics | Cloud-native data lakehouse on GCP using Medallion Architecture (Bronze → Silver → Gold). Processes 500K–1M gaming records via PySpark on Dataproc Serverless. Infrastructure managed with Terraform, orchestration via Cloud Composer. |
| Medical Risk Prediction | Ensemble ML model (Random Forest, XGBoost, Logistic Regression) on the NHANES dataset (8,000+ records). Achieved 90.4% accuracy and 0.88 AUC-ROC with feature engineering across 145 clinical variables. |
| ERDiagram-To-Schema | Fine-tuned Qwen2.5-VL to generate database schemas from ER diagram images. Achieved 89.2% table accuracy and 90% relationship accuracy on held-out evaluation set. |
| ERP-CRM Data Warehouse | Enterprise data warehouse for ERP/CRM source systems with dimensional modeling and dbt-style transformation layers. |
| Realtime Retail Analytics | End-to-end streaming pipeline with Kafka producers, Spark Structured Streaming consumers, and a live analytics dashboard. |
| LLMify | Multi-model LLM chatbot platform with unified interface across OpenAI, Anthropic, and Google providers. Per-IP rate limiting, abort signal propagation, and file-based chat persistence. |
Languages — Python · SQL · PySpark · TypeScript · Bash
GCP — BigQuery · Cloud Run · Dataflow · Dataproc · Pub/Sub · Cloud Composer
Data Engineering — Apache Kafka · Apache Airflow · Apache Spark · Terraform · dbt
Databases — PostgreSQL · MySQL · Elasticsearch · Redis
MCA — Devi Ahilya Vishwavidyalaya, Indore · 2024
BCA — Integral University, Lucknow · 2022




