Welcome to my GitHub! Iโm a Data Scientist & Machine Learning Enthusiast passionate about turning raw data into meaningful insights. With experience in data analysis, machine learning, and visualization, I love solving complex problems and making data-driven decisions.
๐ Education:
- M.S., Computer Science (Machine Learning) | Georgia Tech (Expected Dec 2027)
- B.A., Data Science (Business & Industrial Analysis) | University of California, Berkeley (Dec 2024)
๐ก What I Do:
- Build machine learning models to solve real-world problems
- Design data pipelines for predictive analytics
- Develop interactive visualizations to make data more accessible
- Apply statistical analysis & A/B testing for data-driven insights
๐ Tech Stack:
- Languages: Python, SQL
- Frameworks & Libraries: Pandas, NumPy, Scikit-learn, PyTorch, LangChain
- ML & AI: Retrieval-Augmented Generation (RAG), semantic search, embeddings (sentence-transformers/all-MiniLM-L6-v2), LLM prompt engineering
- Databases & Infra: ChromaDB (vector store), persistent local storage
- Tools: Tableau, Jupyter Notebook, Google Colab, Git, Hugging Face ecosystem
A retrieval-augmented generation (RAG) AI mentor that guides job seekers through data science interview preparation using knowledge from textbooks, career advices and online community discussions such as Reddit, Quora, etc. Unlike generic chatbots, this mentor always cites its sources, building trust through transparency. Addresses a critical gap: quality career mentorship is scarce and limited by volunteer availability, but this tool scales guidance to anyone, anytime.
This project investigates the relationship between air quality (measured through PM2.5 levels) and socioeconomic mobility across California counties. By combining environmental and economic datasets, I aim to understand how exposure to poor air quality during childhood influences long-term economic outcomes. My analysis employs causal inference and predictive modeling techniques to assess and quantify these relationships.
This project analyzes and predicts chronic absebteeism among students in schools within the Oakland District. By leveraging historical attendance, demographics, and academic data, I build machine learning models to predict absenteeism risks and provide insights for early intervention.