About

👋 Hi, I’m Akash, an applied researcher/engineer with experience in speech, audio (at Microsoft), and most recently multi-modal document understanding and retrieval (at Contextual AI). This incidentally completes the trio of audio, vision & text AI multimodality. :)

I’m currently on a sabbatical. After moving to the USA for grad school ~10 years ago, I decided to take a break to reflect, recharge and tinker before setting sail again. More on this here shortly!

Work

Contextual AI

[2024-25]

Wrangled millions of pages to land the first $ millions in enterprise contracts :)

  • Product development (0→1): RAG platform for knowledge agents
  • Applied research: Synthesis of long complex documents, eval design
  • SWE things: Workflow/agent framework architecture, testing, observability, and scalability
  • Tech Lead Manager: DRI cross-company; Mentored team of 3, interviewed candidates

Microsoft

[2018-23]

Fun fact: ~6M hours of monthly traffic equals 1 *year* of conversations transcribed per hour!

  • Model development: state-of-art transcription designed for scale [O(1e7) hrs/month]
  • Applied research: diarized multi-speaker multi-mic transcription
    • Shipped diarized in-conference room transcription device covered by The Verge
    • Lead contributor: ASR training recipes, evaluation metrics, cross-system error analysis
  • Research engineering: data pipelines, optimizing distributed training and inference
    • Speeding up O(1e20) FLOP training on low-cost V100 GPUs
    • Leveraged NVIDIA/ONNX profiling tools to fix bottlenecks in inference throughput
  • Other Links:

‘Graduated’ as one of the few non-speech-PhD senior members on the team :)

Misc

Open source

Other

  • [2016/17] Wrote case studies on the music streaming industry while studying business/tech strategy at Stanford MS&E.
  • [2014] Organized (at the time) Chennai’s largest EDM gig - with 5k+ attendees, during my undergrad at IIT Madras/Chennai.