Rakesh Utekar rakeshutekar

AI Engineer working across the stack on RAG, LLM fine-tuning, and real-time speech systems. My work has moved from computer-vision and speech models toward applied GenAI — fine-tuning open LLMs, building retrieval pipelines on vector DBs, and shipping low-latency audio workflows to cloud. Based in San Francisco, CA.

Experience

AI Fund — Technical Builder / AI Engineer (Dec 2025–present, Mountain View, CA)
Ditto AI — AI Engineer (May–Dec 2025, Berkeley, CA)
JetskiAI — Founding AI/ML Engineer (Mar–Dec 2025, SF Bay Area)
SuperIntro — AI Software Engineer (Dec 2024–Apr 2025, SF) — Fine-tuned Qwen 2.5 LLM and Stable Diffusion pipelines on Vertex AI with LightRAG; deployed fine-tuned models on GCP with cloud logging and monitoring; integrated via Azure AI Foundry.
Sizzle — AI Engineer (Jan–Mar 2025, SF) — Designed a low-latency Whisper + Qwen 2.5 + BERT workflow with Librosa/TorchAudio feature extraction; +15% metadata-tagging precision, +20% acoustic-linguistic alignment.
Melp App, Inc. — Software Developer (AI/ML) (May–Jun 2025, SF Bay Area)
Seattle University — Research Assistant (Aug 2024–Dec 2025, Seattle, WA) — Under Prof. Pejman Khadivi: fine-tune Transformers and CNNs for NLP and predictive analytics, with emphasis on automation and model deployment.
Seattle University — Teaching Assistant, Visual Analytics (Mar–Jun 2024, Seattle, WA)
SlashRTC — Machine Learning Engineer (Sep 2021–Aug 2022) / ML Intern (Jun–Aug 2021), Mumbai, India — Built Speech-to-Text models with Python and TensorFlow.

Flagship Projects

Real-Time Sign Language → Speech

Problem: Bridge communication for the deaf and hard-of-hearing by translating American Sign Language signs into spoken audio.

Approach: Fine-tune an I3D (Inflated 3D ConvNet) on the WLASL dataset for word-level sign recognition, piping predictions into a TTS stage. I3D captures spatiotemporal features across stacked video frames rather than treating frames independently, which suits the motion-heavy nature of signing.

Stack: PyTorch · I3D · WLASL · OpenCV

View repo →

Real-Time Speech → Speech Translation

Problem: Enable live cross-language conversation without the stop-and-wait of batch translation.

Approach: A streaming pipeline chains Whisper (ASR) → translation → OpenAI TTS, with audio streamed in and out continuously. It prioritizes low end-to-end latency by keeping the stages pipelined rather than processing each utterance as a discrete block.

Stack: Whisper · OpenAI TTS · Python · streaming audio I/O

View repo →

Tech Stack

Languages

ML / Deep Learning

GenAI / LLM

Speech / Audio · Vision

Cloud / Infra

🤝 Open to Collaborate

I enjoy giving back to the AI/ML community and am always happy to:

🏆 Judge hackathons, demo days, and AI/ML competitions
🎤 Give interviews, talks & guest sessions on applied GenAI, RAG, and real-time speech
🧭 Mentor & guide engineers and students breaking into AI/ML
💡 Consult & advise on AI/ML product direction and architecture

📫 Reach me at rakeshutekar60@gmail.com or on LinkedIn.

📊 GitHub Analytics

🟡 Watch Pac-Man eat my contributions

Pac-Man eating my GitHub contribution graph

Education

MS, Computer Science (Data Science Specialization) — Seattle University (Sep 2022–Aug 2024)
BTech, Computer Engineering — University of Mumbai (2016–2021)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly