About
👋 Hi, I’m Akash, an applied researcher/engineer with experience in speech, audio (at Microsoft), and most recently multi-modal document understanding and retrieval (at Contextual AI). Turns out this completes the trio of audio, vision & text AI multimodality. :)
I’m currently on a brief sabbatical, exploring ideas & tinkering as I work out what’s next. Currently exploring real-time, on-device neural audio in the context of music and voice.
Work
Contextual AI
[2024-25]
Wrangled millions of pages to land the first $ millions in enterprise contracts :)
The Context platform for knowledge agents (i.e. RAG). Joined pre-Series A and product launch.
- Product Development (0→1): Built core multimodal document understanding (parsing/OCR, representation) powering document ingestion and retrieval.
- Applied research: Vision models, VLM workflow/agent framework, eval/annotation process & tooling, document representation for agentic retrieval.
- Tech Lead / Manager: Roadmap, Release planning, DRI with forward-deployed/eng/PM/marketing, Mentored, interviewed candidates.
- Links:
Microsoft
[2018-23]
Fun fact: ~6M hours of monthly traffic equals 1 *year* of conversations transcribed per hour!
- Shipped state of art transcription models designed for scale on Azure and Microsoft Teams APIs [O(1e7) hrs/month].
- Applied research: data & training recipes, evaluation metrics & error analysis, scalability focused architecture design.
- Research engineering: data pipeline, distributed training framework, optimizing training & inference in ONNX/C++.
- ‘Graduated’ as one of the few non-speech-PhD senior members on the team :)
- Links:
Misc
Open source
- [2023] 🐥🗣️ Contributed to whisper.cpp (38k stars). tinydiarize is a lightweight prototype extending OpenAI’s Whisper model for speaker diarization, runnable on Macbooks/iPhones.
- [2020] 🐋 Co-founded OrcaHello, a system for 24/7 monitoring of Southern Resident Killer Whales across many underwater “hydrophones” in the Pacific Northwest. It was awarded a $30,000 AI for Earth Innovation Grant in 2020 and has been operating in the wild for >4 years. Listen here live or to past detections.
- [2018] 🗣️ Built Attention, I’m Trying to Speak: speech synthesis with just $75 of compute. Got to fist-bump Richard Socher for Stanford CS224n project award :).
Other
- [2016/17] Wrote case studies on the music streaming industry while studying business/tech strategy at Stanford MS&E.
- [2014] Organized (at the time) Chennai’s largest EDM gig - with 5k+ attendees, during my undergrad at IIT Madras/Chennai.