Cognitive Collective

Helping you find your next career in AI. Learn more about the job board on the Scale blog.

Are you a scaling AI startup? Email to be added to our board.

Technical Program Manager, Reliability



IT, Operations
San Francisco, CA, USA
Posted on Wednesday, March 6, 2024

About the Team

The Applied team is responsible for the engineering systems behind OpenAI's ChatGPT and API products.

About the Role

As a Technical Program Manager, you will drive complex interdisciplinary research/engineering projects and programs across many teams.

We are looking for people who have experience running large-scale complex technical programs in infrastructure engineering. You will work with researchers/engineers to initiate new projects, set ambitious goals and milestones, and drive execution across multiple teams.

We're looking for a TPM with experience driving programs in infrastructure engineering organizations. You're fluent with infrastructure engineering concepts and are able to work with external partners in addition to internal partners to drive forward initiatives.

You're rigorous but also fast in project execution. You're highly empathetic and people-oriented with strong communication skills. You care deeply about the mission.

This role is based in our San Francisco HQ. We offer relocation assistance to new employees.

In this role, you will:

  • Operationalize and execute critical cross functional programs spanning many engineering teams as well as external infrastructure partners including Azure teams.

  • Develop and implement comprehensive reliability and business continuity programs that align with organizational objectives and industry best practices.

  • Collaborate with engineering, operations, and other cross-functional teams to identify and mitigate risks to business operations and technology infrastructure.

  • Lead the design and execution of disaster recovery and incident response plans, ensuring rapid recovery and minimal impact on business operations.

  • Conduct regular risk assessments and business impact analyses to identify vulnerabilities and prioritize mitigation efforts.

  • Develop and implement program management frameworks, and KPIs to achieve goals

  • Develop and maintain metrics and reporting systems to monitor the effectiveness of reliability and business continuity programs.

  • Manage cross-functional projects to improve system reliability, reduce downtime, and enhance business continuity.

  • Create technical roadmaps with milestones and coordinate across teams to deliver against them at all stages of the project lifecycle

  • Communicate progress, status and risk effectively to stakeholders internally and externally

  • Manage dependencies across multiple teams

You might thrive in this role if you:

  • 7+ years experience managing complex technical programs at large scale in Infrastructure.

  • Strong track of execution in delivering ambitious goals on complex cross-functional projects

  • Proven experience as a Technical Program Manager or similar role, with a focus on reliability and business continuity.

  • Strong understanding of reliability engineering principles, disaster recovery planning, and business continuity management frameworks.

  • Experience in core infrastructure and cloud computing services.

  • Experience working in a high-pace environment with continuously evolving priorities

  • Ability to work with research/engineering teams to set ambitious goals, milestones

  • Strong analytical and problem-solving skills, with the ability to identify risks and develop effective mitigation strategies.

  • Excellent communication and interpersonal skills, with the ability to collaborate effectively with stakeholders at all levels.

  • Ability to see around the corners, anticipate and plan for risks.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.