About

👋 Hi, I’m Akash, an applied researcher/engineer with experience in speech, audio (at Microsoft), and most recently multi-modal document understanding and retrieval (at Contextual AI). Turns out this completes the trio of audio, vision & text AI multimodality. :)

I’m currently on a brief sabbatical, exploring ideas & tinkering as I work out what’s next. Currently exploring real-time, on-device neural audio in the context of music and voice.

Work

Contextual AI

[2024-25]

Wrangled millions of pages to land the first $ millions in enterprise contracts :)

The Context platform for knowledge agents (i.e. RAG). Joined pre-Series A and product launch.

Product Development (0→1): Built core multimodal document understanding (parsing/OCR, representation) powering document ingestion and retrieval.
Applied research: Vision models, VLM workflow/agent framework, eval/annotation process & tooling, document representation for agentic retrieval.
Tech Lead / Manager: Roadmap, Release planning, DRI with forward-deployed/eng/PM/marketing, Mentored, interviewed candidates.
Links:
- Introducing the Document Parser for RAG
- Demo: llms.txt for Documents - Beyond Retrieval to Agentic Navigation

Microsoft

[2018-23]

Fun fact: ~6M hours of monthly traffic equals 1 *year* of conversations transcribed per hour!

Shipped state of art transcription models designed for scale on Azure and Microsoft Teams APIs [O(1e7) hrs/month].
Applied research: data & training recipes, evaluation metrics & error analysis, scalability focused architecture design.
Research engineering: data pipeline, distributed training framework, optimizing training & inference in ONNX/C++.
‘Graduated’ as one of the few non-speech-PhD senior members on the team :)
Links:
- Batch transcription - Azure AI Speech service
- US Patent US11563784B2: Caption assisted calling to maintain connection in challenging network conditions

Misc

Open source

[2023] 🐥🗣️ Contributed to whisper.cpp (38k stars). tinydiarize is a lightweight prototype extending OpenAI’s Whisper model for speaker diarization, runnable on Macbooks/iPhones.
[2020] 🐋 Co-founded OrcaHello, a system for 24/7 monitoring of Southern Resident Killer Whales across many underwater “hydrophones” in the Pacific Northwest. It was awarded a $30,000 AI for Earth Innovation Grant in 2020 and has been operating in the wild for >4 years. Listen here live or to past detections.
[2018] 🗣️ Built Attention, I’m Trying to Speak: speech synthesis with just $75 of compute. Got to fist-bump Richard Socher for Stanford CS224n project award :).

Other

[2016/17] Wrote case studies on the music streaming industry while studying business/tech strategy at Stanford MS&E.
[2014] Organized (at the time) Chennai’s largest EDM gig - with 5k+ attendees, during my undergrad at IIT Madras/Chennai.