Perception Engineer
Tavus
About Us
At Tavus, we're building the human layer of AI. Our mission is to make human-AI interaction as natural as face-to-face interaction, enabling the human touch where it has been previously unscalable. We achieve this through pioneering research in multi-modal AI models for human perception and understanding, combined with state-of-the-art human avatar rendering and communication models. Our models power everything from text-to-video AI avatars to real-time conversational video experiences across industries like healthcare, recruiting, sales, education, and more. By enabling AI to see, hear, and communicate with human-like authenticity, we're creating the foundation for the next generation of AI employees, assistants, and companions.
We're a Series A company backed by top investors, including Sequoia, Y Combinator, and Scale VC. Join us in driving the future of human-AI interaction. Check it out for yourself 😎
The Role
We’re looking for a Perception Engineer to help advance the core visual understanding systems behind Tavus’ AI-generated video experiences. In this role, you’ll work on foundational models and systems that enable our avatars to "see" and interpret the world - from facial dynamics and motion tracking to scene understanding and multi-modal perception.
You’ll join a small, fast-moving applied ML team where experimentation is encouraged, and ownership is expected. We’re not just iterating - we’re inventing. If you’re excited about solving real-world computer vision problems and shipping production-ready models that power next-gen human-AI interaction, we want to talk.
Your Mission 🚀
Develop and deploy perception models for tasks like emotion recognition, pose estimation, motion transfer, and scene parsing
Own the data pipelines and tooling necessary to train and evaluate these models at scale
Collaborate with researchers and engineers to integrate perception models into our real-time conversational product
Design and run experiments to optimize accuracy, speed, and robustness across diverse video conditions
Stay at the forefront of vision research and bring new ideas from paper to product
Requirements
2-3+ years of experience building computer vision or ML systems in production
Strong Python and PyTorch skills, with experience in real-time or low-latency video applications
Deep understanding of at least two of the following: facial recognition, emotion recognition, generative vision models, or 3D reconstruction
You’re a self-starter with a bias toward action and a passion for solving hard, ambiguous problems
Bonus if you have:
Experience building inference systems optimized for performance and scale
Background in multi-modal learning or fusing vision + audio inputs
Published or implemented research papers in computer vision or generative media
Played Portal 1 and 2 - or willing to as part of onboarding 😄
Benefits
When you join Tavus, you’re joining a family. Our work is driven by our team, and our success is shared by all. This position has a flexible work schedule, unlimited PTO, competitive healthcare and gear stipends, as well as, of course, plenty of fun! At the end of the day, we want Tavus to be a place for you to learn, directly drive impact, and be with a team you love.
To learn more about our team culture, and benefits, check out our hiring page!
Tavus is growing fast, and we’d like you to grow with us! Are you excited to get your hands dirty? Drop your resume and we’ll be in touch!
We are not looking for cultural fits, we are looking for culture creators. In fact, diversity is what drives our success – it’s at the core of how we hire, communicate, and work. We are inclusive to all and combine our diverse backgrounds, skill sets, and thinking to build the best experiences for our clients.