Cognitive Collective

Helping you find your next career in AI. Learn more about the job board on the Scale blog.

Are you a scaling AI startup? Email maxwell@scalevp.com to be added to our board.
companies
Jobs

Head of Evaluation

Harvey

Harvey

San Francisco, CA, USA
Posted 6+ months ago

Head of Evaluation

Why Harvey
Harvey will be a category-defining company for the application layer built on top of foundation models like GPT-4.
  • Exceptional product market fit: multiple multi-million dollar deals with the largest professional service providers (e.g. PwC) and the largest law firms on Earth (e.g. Allen & Overy).
  • Massive demand: 15,000 law firms on our waitlist.
  • World-class team: ex-DeepMind, Google Brain, FAIR, Tesla Autopilot. Former founding engineers at $1B+ startups like Superhuman and Glean.
  • Work directly with OpenAI to build the future of generative AI and redefine professional services.
  • Top of market cash and equity compensation.
Challenges
We are building systems that can automate the most complex knowledge work in the world, e.g. billion dollar litigations and corporate transactions.
  • Dealing with the most sensitive data in the world: client data from the largest companies in the world.
  • Working past the edge of published AI research: tackling problems far beyond the complexity of existing AI benchmarks.
  • Unsolved product, architectural, and business problems: natural language interfaces, prohibitively expensive evaluation of models, massive marginal costs, versioning / training / segregating models per task / legal system / practice area / client and client’s clients.
Role
We are looking for a technical lead who can own the development of our evaluation platform. In this role, you will:
  • Build a team of 10-20 researchers and engineers with experience evaluating LLMs and large-scale AI systems.
  • Lead research and development of novel model-based evaluation methods and language model programs for evaluating complex tasks in legal and professional services.
  • Design and implement a red-teaming pipeline for our custom models and collaborate with other research teams to fine-tune models from human feedback.
  • Train reward models that accurately reflect the preferences of top-tier domain experts.
  • Experiment with synthetic data generation and LLM-based data augmentation to complement human-generated eval benchmarks.
Impact
  • Lead research and development of Harvey’s evaluation platform.
  • Contribute to a product that transforms the nature of professional services.
  • Help define what it means for LLMs to effectively perform complex knowledge work tasks.
  • Work directly with our founders, research, and product teams, as well as foundation model providers like OpenAI.
  • Tackle unsolved research and engineering problems, including the hardest in the world relevant to LLMs in production.
Qualifications
  • 5+ years experience leading highly-technical teams composed of both researchers and engineers.
  • Experience evaluating large-scale AI systems in high-stakes settings.
  • Technical: can serve as a tech lead and contribute substantially to our codebase as necessary.
  • Ability to communicate complex technical outcomes to diverse stakeholders.
  • Strong conviction in setting technical direction.