Cognitive Collective

Helping you find your next career in AI. Learn more about the job board on the Scale blog.

Are you a scaling AI startup? Email to be added to our board.

DevOps Engineer



Software Engineering
Tel Aviv-Yafo, Israel
Posted on Wednesday, August 10, 2022

DevOps Engineer

Tel Aviv, Israel | Software Engineering | Full-Time

About Deepchecks

As machine learning systems begin to transition from the research phase to the production phase, it’s becoming clear that they have unique QA and testing related challenges.
Deepchecks offers a customizable, plug & play, algorithm-based solution, for testing and monitoring machine learning systems.

Job Description

Deepchecks is a VC-backed startup tackling the huge problem of controlling Machine Learning systems.

AI systems are being adopted by more and more organizations and are taking an increasingly important role in their business. Although many resources are allocated to creating and optimizing the machine learning models, they still lack “common sense” and make various mistakes that may go undetected for long periods of time. We focus on detecting, preventing, and fixing these “AI glitches”, using mathematical concepts and algorithmic research. Our product monitors these systems in production, identifies a wide range of potential problems, and offers different types of alerts and explanations (depending on the type and the severity of the issue).

The startup was founded by two Talpiot graduates / Data Scientists and a leading professor in this field. Following a few months of R&D and initial customer traction, the time has come to expand our extremely talented (and fun!) team. Our offices are in Tel-Aviv, although we’ve recently been working from home most of the time.

DevOps Engineer (Tel Aviv)

We’re looking for a top-notch cloud engineer to join us! As part of your job, you will be responsible to manage a massive Kubernetes platform, developing and improving CI/CD processes, being a part of the Design of CI/CD stages, and have strong influence and responsibility on the core product architecture.

You will work in a dynamic environment where multiple projects will be active at once.

Your daily work will include a wide range of tools such as Kubernetes, Helm, Terraform, Prometheus, Gitlab-ci, Docker, Apache Kafka, Git, Python focusing on managing Kubernetes resources on a high-scale environment.

Required Experience:

  1. 5+ years of experience working in a Linux environment with at least one language like Golang, Python, Java, or C or C++ (Preferebally Python)
  2. 3+ years of experience with Docker & Kubernetes and Microservices Architecture.
  3. Knowledge and proven experience in Continuous Integration and Deployment (CI/CD) tools and methodologies.
  4. Experience with production systems that are characterized by high load, data, and machine learning processes.
  5. Proficiency with cloud environment deployments (GCP AWS azure) and on-prem deployments (like Rancher)
  6. Proficiency with Linux OS and bash scripting.
  7. Experience in building internal monitoring stacks, for example: ELK, Prometheus, Graphana, etc.
  8. Strong design, architecture, and problem-solving experience.

Advantage if experienced with:

  1. MLOps
  2. Hands-on experience with common big data frameworks such as: Kafka Spark Flink etc
  3. Rancher or other k8s on-prem deployment solutions
  4. Experience with infrastructure as code tools.
  5. Istio