Skip to content
View pvcodes's full-sized avatar

Block or report pvcodes

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pvcodes/README.md

Pranjal Verma

Data Engineer · GCP Specialist · Open to Opportunities

Website LinkedIn Twitter Blog Email


Data Engineer with 2+ years of production experience designing and operating cloud-native data infrastructure on GCP. Currently at Accenture, building real-time event-driven pipelines and a BigQuery warehouse serving analytical workloads at scale. GCP Professional Data Engineer certified.


Experience

Data Engineer — Accenture Sept 2024 – Present · Pune, India

  • Redesigned BigQuery data warehouse partitioning and clustering strategy; reduced average query latency by 60% and monthly compute costs by 35%
  • Built real-time event-driven ingestion pipelines using Cloud Run, EventArc, and Apache Kafka
  • Orchestrated multi-sink ETL workflows with Cloud Composer (Airflow), delivering to Elasticsearch, MySQL, PostgreSQL, and Kafka topics

Data Engineer — Walkover Jan 2024 – Sept 2024

  • Designed backend data infrastructure for a workflow automation platform serving 10,000+ concurrent users
  • Architected fault-tolerant RabbitMQ pipelines, achieving 50% throughput improvement over the prior architecture
  • Optimized database access patterns with batched reads and writes; 30% faster retrieval under peak load

Selected Projects

Project Description
VLR Analytics Cloud-native data lakehouse on GCP using Medallion Architecture (Bronze → Silver → Gold). Processes 500K–1M gaming records via PySpark on Dataproc Serverless. Infrastructure managed with Terraform, orchestration via Cloud Composer.
Medical Risk Prediction Ensemble ML model (Random Forest, XGBoost, Logistic Regression) on the NHANES dataset (8,000+ records). Achieved 90.4% accuracy and 0.88 AUC-ROC with feature engineering across 145 clinical variables.
ERDiagram-To-Schema Fine-tuned Qwen2.5-VL to generate database schemas from ER diagram images. Achieved 89.2% table accuracy and 90% relationship accuracy on held-out evaluation set.
ERP-CRM Data Warehouse Enterprise data warehouse for ERP/CRM source systems with dimensional modeling and dbt-style transformation layers.
Realtime Retail Analytics End-to-end streaming pipeline with Kafka producers, Spark Structured Streaming consumers, and a live analytics dashboard.
LLMify Multi-model LLM chatbot platform with unified interface across OpenAI, Anthropic, and Google providers. Per-IP rate limiting, abort signal propagation, and file-based chat persistence.

Stack

Languages — Python · SQL · PySpark · TypeScript · Bash

GCP — BigQuery · Cloud Run · Dataflow · Dataproc · Pub/Sub · Cloud Composer

Data Engineering — Apache Kafka · Apache Airflow · Apache Spark · Terraform · dbt

Databases — PostgreSQL · MySQL · Elasticsearch · Redis


Certifications

GCP Professional Data Engineer GCP Associate Cloud Engineer Databricks Data Engineer Associate


GitHub Activity

GitHub Streak


Education

MCA — Devi Ahilya Vishwavidyalaya, Indore · 2024

BCA — Integral University, Lucknow · 2022


Open to Data Engineering, Analytics Engineering, and Platform roles · hi@pvcodes.in

Pinned Loading

  1. vlr-analytics vlr-analytics Public

    Jupyter Notebook

  2. llmify llmify Public

    TypeScript