Cognitive Collective

Helping you find your next career in AI. Learn more about the job board on the Scale blog.

Are you a scaling AI startup? Email maxwell@scalevp.com to be added to our board.
companies
Jobs

ML Team Lead (Speech Synthesis)

Rime

Rime

Software Engineering, Data Science
San Francisco, CA, USA
Posted on Apr 18, 2025

Job Title: Team Lead - ML Scientist

Location: Hybrid + Flexible

Experience Level: Mid-Senior

About Us:
We are a cutting-edge speech ML startup at the forefront of innovation in audio and machine learning . Our mission is to make it easier for enterprise developers of high-impact voice applications to ship compelling experiences.

We’re looking for a highly motivated ML Engineer with a speech synthesis focus to join our dynamic team. This role offers the opportunity to solve complex engineering challenges, contribute to core product development, and grow into a leadership position as our team expands.

  • Building and maintaining our products and platform on top of our cutting-edge voice models, which power hundreds of millions of conversations every month.

  • Train SOTA models to synthesize speech in any language for real-time conversations that power hundreds of millions of brand experiences every month.

  • Work closely with the product engineering team to expose and integrate key features of our models into the product.

  • Define the technical roadmap for developing multimodal assistive agents for products, working closely with technical leads and product managers, and drive execution from concept to deployment. You will provide guidance based on your domain knowledge and experience, ensuring alignment with overall deployment strategy.


Requirements:

  • You've spent a few years mastering one or two specific areas: could be vocoders, LLMs, speech encoders (w2v2, X-hubert, etc.), diffusion, flow matching, etc.

  • Experience with distributed training over multiple nodes of GPUs

  • Experience processing extremely large amounts of text, audio, or video data for use downstream in experiment and training paradigms

  • Proficient with PyTorch

  • English language proficiency

Preferred Skills:

  • Understanding of the interface between text normalization and speech synthesis inference

  • Understanding of and/or experience with inference optimization techniques

  • Knowledge of current approaches to zero- and few-shot voice cloning.

  • Experience with architecting and maintaining complex MLOps systems

  • Hands-on experience with LLM-based approaches to speech synthesis

  • Basic familiarity with full duplex modeling of turn-taking a.k.a. speech-to-speech modeling


What We’re Looking For:

  • A self-starter with high initiative and the ability to work autonomously.

  • Proven track record of successfully building and deploying AI-powered solutions in an industry setting.

  • Excellent problem-solving skills and a solutions-oriented mindset.

  • Strong communication skills to collaborate effectively across teams.

  • An eye for the bigger picture and the ambition to take on leadership responsibilities in the near future.


What We Offer:

  • A collaborative and innovative work environment.

  • Full health benefits (Vision, Dental, Health)

  • Opportunities for growth and leadership as the team expands.

  • Competitive salary and equity options.

  • Flexible work hours and remote-first culture.

  • Appreciation for work-life balance.

  • Beautiful office in the heart of SF and close to public transit

If you’re excited to work on the cutting edge of Voice AI and machine learning technology, we’d love to hear from you!