👋 Hi, I’m Akash, an applied researcher/engineer with experience in speech, audio (at Microsoft), and most recently multi-modal document understanding and retrieval (at Contextual AI). Turns out that this completes the trio of audio, vision & text AI multimodality. :)
I’m currently on a brief sabbatical, exploring ideas & tinkering as I work out what’s next. Currently exploring real-time, on-device neural audio in the context of music and voice.
Work
Contextual AI
Wrangled millions of pages to land the first $ millions in enterprise contracts :)
- 0→1: Built the core multimodal document understanding system (parsing, representation and ingestion) powering retrieval for the context layer over enterprise documents.
- Applied research: Vision models, VLM workflow/agent framework, document representation for agentic retrieval (demo below).
- Tech Lead / Manager: Mentored, managed, interviewed candidates, DRI with forward-deployed eng, marketing, PM.
- Links:
Microsoft
Fun fact: ~6M hours of monthly traffic equals 1 year of conversations transcribed per hour!
- Shipped and optimized state of art models transcribing millions of hours of monthly conversations on Azure and Microsoft Teams APIs.
- Research engineering: data pipeline, distributed training framework, profiling, optimizing for inference in ONNX/C++
- Applied research: Scalability focused architecture design, data & training recipes, error analysis, evaluation metrics
- Was one of the few non-PhD senior members on the team :)
- Links:
Misc
- [2023] 🐥🗣️ Open source contribution to whisper.cpp (38k stars). tinydiarize is a lightweight prototype extending OpenAI’s Whisper model for speaker diarization, runnable on Macbooks/iPhones.
- [2020] 🐋 Co-founded OrcaHello, a system for 24/7 monitoring of Southern Resident Killer Whales across many underwater hydrophones in the Pacific Northwest. It was awarded a $30,000 AI for Earth Innovation Grant in 2020 and has been operating live for >4 years - listen here.
