Role Overview

This contract will begin with a 1 month paid trial which can then extend to 6+ months

Job description:

We're hiring experienced DevOps engineers to author and validate Infrastructure-as-Code (IaC) tasks. You'll design realistic infrastructure scenarios, build the ground-truth solutions, and define the automated checks that grade whether an AI agent solved them correctly and safely. This is hands-on engineering and judgment work - you're encoding what "good" looks like for real infrastructure operations into verifiable tasks. If you've spent years writing Terraform, Pulumi, CloudFormation, cloud CLI, for AWS or GCP, we want to work with you.

What you'll do:

Author IaC tasks grounded in real-world AWS and GCP) scenarios.
Build ground-truth solutions for each task: correct, idempotent IaC that converges to the desired end state.
Design verifiable graders - automated checks that confirm an agent reached the correct end state
Review and QA tasks authored by other engineers for correctness, difficulty calibration, and robustness.
Harden tasks against reward hacking
Document task intent, assumptions, edge cases, and scoring rationale clearly.

Must-have qualifications:

4+ years in DevOps / cloud engineering.
Deep, hands-on Infrastructure-as-Code expertise: Terraform (required); Pulumi, CloudFormation, or CDK a plus.
Strong AWS and GCP depth
Comfortable scripting in Python (and/or Bash) to build and automate validation.
A clear sense of correctness and verification - you can articulate, in code, what makes an infrastructure outcome right and safe
Prior work in AI evaluation, benchmarking, or expert data annotation preferred.
Prior experience with LocalStack preferred

DevOps Engineer (Full Time - Remote)

Role Overview