AI Engineer working across the stack on RAG, LLM fine-tuning, and real-time speech systems. My work has moved from computer-vision and speech models toward applied GenAI — fine-tuning open LLMs, building retrieval pipelines on vector DBs, and shipping low-latency audio workflows to cloud. Based in San Francisco, CA.
- AI Fund — Technical Builder / AI Engineer (Dec 2025–present, Mountain View, CA)
- Ditto AI — AI Engineer (May–Dec 2025, Berkeley, CA)
- JetskiAI — Founding AI/ML Engineer (Mar–Dec 2025, SF Bay Area)
- SuperIntro — AI Software Engineer (Dec 2024–Apr 2025, SF) — Fine-tuned Qwen 2.5 LLM and Stable Diffusion pipelines on Vertex AI with LightRAG; deployed fine-tuned models on GCP with cloud logging and monitoring; integrated via Azure AI Foundry.
- Sizzle — AI Engineer (Jan–Mar 2025, SF) — Designed a low-latency Whisper + Qwen 2.5 + BERT workflow with Librosa/TorchAudio feature extraction; +15% metadata-tagging precision, +20% acoustic-linguistic alignment.
- Melp App, Inc. — Software Developer (AI/ML) (May–Jun 2025, SF Bay Area)
- Seattle University — Research Assistant (Aug 2024–Dec 2025, Seattle, WA) — Under Prof. Pejman Khadivi: fine-tune Transformers and CNNs for NLP and predictive analytics, with emphasis on automation and model deployment.
- Seattle University — Teaching Assistant, Visual Analytics (Mar–Jun 2024, Seattle, WA)
- SlashRTC — Machine Learning Engineer (Sep 2021–Aug 2022) / ML Intern (Jun–Aug 2021), Mumbai, India — Built Speech-to-Text models with Python and TensorFlow.
|
Problem: Bridge communication for the deaf and hard-of-hearing by translating American Sign Language signs into spoken audio. Approach: Fine-tune an I3D (Inflated 3D ConvNet) on the WLASL dataset for word-level sign recognition, piping predictions into a TTS stage. I3D captures spatiotemporal features across stacked video frames rather than treating frames independently, which suits the motion-heavy nature of signing. Stack: PyTorch · I3D · WLASL · OpenCV |
Problem: Enable live cross-language conversation without the stop-and-wait of batch translation. Approach: A streaming pipeline chains Whisper (ASR) → translation → OpenAI TTS, with audio streamed in and out continuously. It prioritizes low end-to-end latency by keeping the stages pipelined rather than processing each utterance as a discrete block. Stack: Whisper · OpenAI TTS · Python · streaming audio I/O |
Languages
ML / Deep Learning
GenAI / LLM
Speech / Audio · Vision
Cloud / Infra
I enjoy giving back to the AI/ML community and am always happy to:
- 🏆 Judge hackathons, demo days, and AI/ML competitions
- 🎤 Give interviews, talks & guest sessions on applied GenAI, RAG, and real-time speech
- 🧭 Mentor & guide engineers and students breaking into AI/ML
- 💡 Consult & advise on AI/ML product direction and architecture
📫 Reach me at rakeshutekar60@gmail.com or on LinkedIn.
- MS, Computer Science (Data Science Specialization) — Seattle University (Sep 2022–Aug 2024)
- BTech, Computer Engineering — University of Mumbai (2016–2021)





