Software Engineer, Distributed Training
Adept AI
This job is no longer accepting applications
See open jobs at Adept AI.See open jobs similar to "Software Engineer, Distributed Training" Cognitive Collective.Adept is working to advance a people-centric approach to AI that optimizes for what’s actually most useful for people and their work. You can see this approach in the technology we’re building: models that are trained to use software and take actions just as a person would.
We’ve recently raised a $350M Series B led by General Catalyst and Spark, on top of a $65M Series A in 2022 with Addition and Greylock. We’re fortunate to be supported by amazing firms and angels such as Chris Re, Andrej Karpathy, Root Ventures, Howie Liu, Dara Khosrowshahi, and others, and were recently highlighted by Forbes. Adept is backed by a coalition of strategic partners, including Atlassian, Microsoft, NVIDIA, and Workday.
We're looking for passionate team members who want to swing for the fences to accomplish our mission, are excited by a startup environment where the hardest problems are yet to be solved, and are eager to learn and collaborate together in our San Francisco office.
For more information, check out our blog!
Position Summary
Adept is building a new class of multimodal AI models designed specifically for digital agents. Our model iteration speed depends on training performance while using some of the largest GPU clusters around. Engineers with a background in distributed deep learning (DL) training can help Adept iterate faster. Areas of focus on the Infrastructure team include:
- Compute - building and managing large GPU clusters training SOTA models
- Optimization - improving the DL training utilization and reliability of those clusters
- Research - working directly with researchers to align model architecture with training performance
We value curious engineers who can engage with new problems and get things done at a startup. Our team members come from a variety of backgrounds. If you have some of these, you might be a good fit:
- 8+ years of experience as a software engineer
- Expert understanding of distributed training concepts and tools, e.g., torch.distributed, NCCL, MPI, etc.
- Experience with GPU cluster hardware, performance, interconnect, etc.
- Track record at fast-growing companies or startups
- Demonstrated end-to-end ownership and self-direction
- Comfort with moving fast and learning by doing
- Excellent communication and collaboration skills, both verbal and written
The pay range for this position in California is $175,000 - $350,000yr; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan.
Our benefits
- Comprehensive health insurance coverage - 100% for employees
- Dental and vision insurance
- Unlimited vacation time for exempt employees
- 4 remote weeks per year - work from anywhere
- Competitive salary
- Stock options
- Daily meals for those in our comfortable SF office
- Commuter benefits
- Dog friendly
This job is no longer accepting applications
See open jobs at Adept AI.See open jobs similar to "Software Engineer, Distributed Training" Cognitive Collective.