My job alerts

Head of Evaluation

Harvey

This job is no longer accepting applications

San Francisco, CA, USA

Posted 6+ months ago

Why Harvey

Harvey will be a category-defining company for the application layer built on top of foundation models like GPT-4.

Exceptional product market fit: multiple multi-million dollar deals with the largest professional service providers (e.g. PwC) and the largest law firms on Earth (e.g. Allen & Overy).
Massive demand: 15,000 law firms on our waitlist.
World-class team: ex-DeepMind, Google Brain, FAIR, Tesla Autopilot. Former founding engineers at $1B+ startups like Superhuman and Glean.
Work directly with OpenAI to build the future of generative AI and redefine professional services.
Top of market cash and equity compensation.

Challenges

We are building systems that can automate the most complex knowledge work in the world, e.g. billion dollar litigations and corporate transactions.

Dealing with the most sensitive data in the world: client data from the largest companies in the world.
Working past the edge of published AI research: tackling problems far beyond the complexity of existing AI benchmarks.
Unsolved product, architectural, and business problems: natural language interfaces, prohibitively expensive evaluation of models, massive marginal costs, versioning / training / segregating models per task / legal system / practice area / client and client’s clients.

Role

We are looking for a technical lead who can own the development of our evaluation platform. In this role, you will:

Build a team of 10-20 researchers and engineers with experience evaluating LLMs and large-scale AI systems.
Lead research and development of novel model-based evaluation methods and language model programs for evaluating complex tasks in legal and professional services.
Design and implement a red-teaming pipeline for our custom models and collaborate with other research teams to fine-tune models from human feedback.
Train reward models that accurately reflect the preferences of top-tier domain experts.
Experiment with synthetic data generation and LLM-based data augmentation to complement human-generated eval benchmarks.

Impact

Lead research and development of Harvey’s evaluation platform.
Contribute to a product that transforms the nature of professional services.
Help define what it means for LLMs to effectively perform complex knowledge work tasks.
Work directly with our founders, research, and product teams, as well as foundation model providers like OpenAI.
Tackle unsolved research and engineering problems, including the hardest in the world relevant to LLMs in production.

Qualifications

5+ years experience leading highly-technical teams composed of both researchers and engineers.
Experience evaluating large-scale AI systems in high-stakes settings.
Technical: can serve as a tech lead and contribute substantially to our codebase as necessary.
Ability to communicate complex technical outcomes to diverse stakeholders.
Strong conviction in setting technical direction.

This job is no longer accepting applications