Cognitive Collective

Helping you find your next career in AI. Learn more about the job board on the Scale blog.

Are you a scaling AI startup? Email to be added to our board.

Applied Scientist, Machine Learning



Software Engineering
New York, NY, USA
Posted on Saturday, April 13, 2024

Captions is the AI-powered creative studio. Millions of creators around the world have used Captions to make their video content stand out from the pack and we're on a mission to empower the next billion.

Based in NYC, we are a team of ambitious, experienced, and devoted engineers, designers, and marketers. You'll be joining an early team where you'll have an outsized impact on both the product and company's culture.

We’re very fortunate to have some the best investors and entrepreneurs backing us, including Kleiner Perkins, Sequoia Capital, Andreessen Horowitz, Uncommon Projects, Kevin Systrom, Mike Krieger, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, Lenny Rachitsky, and more.

Check out our latest milestone and our recent feature on the TODAY show and the New York Times.

** Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square) **


  • Conduct research and develop models to advance the state-of-the-art in generative video technologies, focusing on areas such as video in-painting, super resolution, text-to-video conversion, background removal, and neural background rendering.

  • Design and develop advanced neural network models tailored for generative video applications, exploring innovative techniques to manipulate and enhance video content for storytelling purposes.

  • Explore new areas and techniques to enhance video storytelling, including research into novel generative approaches and their applications in video production and editing.

  • Create tools and systems that leverage machine learning, artificial intelligence, and computational techniques to generate, manipulate, and enhance video content, with a focus on usability and scalability.

Preferred Qualifications:

  • PhD in computer science or related field or 3+ years of industry experience.

  • Publication Record: Highly relevant publication history, with a focus on generative video techniques and applications. Ideal candidates will have served as the primary author on these publications.

  • Video Processing Skills: Strong understanding of video processing techniques, including video compression, motion estimation, and object tracking, with the ability to apply these techniques in generative video applications.

  • Expertise in Deep Learning: Proficiency in deep learning frameworks such as TensorFlow, PyTorch, or similar, with hands-on experience in designing, training, and deploying neural networks for video-related tasks.

  • Strong understanding of Computer Science fundamentals (algorithms and data structures).


  • Comprehensive medical, dental, and vision plans

  • Anything you need to do your best work

  • We’ve done team off-sites to places like Paris, London, Park City, Los Angeles, Upstate NY, and Nashville with more planned in the future.

Captions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Please note benefits apply to full time employees only.