Alignerr logo

Python Insfrastructure Engineer - Model Evaluation

Alignerr
Department:Android Developer
Type:REMOTE
Region:USA
Location:New York, NY
Experience:Mid-Senior level
Estimated Salary:$80,000 - $120,000
Skills:
PYTHONFULL-STACK DEVELOPMENTSYSTEMS PROGRAMMINGMACHINE LEARNINGOBSERVABILITYMETRICS COLLECTIONDATA PIPELINESEVALUATION HARNESSESBACKEND SERVICESDISTRIBUTED SYSTEMS
Share this job:

Job Description

Posted on: May 26, 2026

Python Infrastructure Engineer — Model Evaluation (AI Training)About The Role What if your Python expertise could directly shape how the world's most advanced AI models are built, evaluated, and improved? We're looking for a Senior Python Infrastructure Engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that power next-generation AI systems at leading research labs. This is a fully remote contract role with serious technical depth — the kind of work that ships to production and influences model quality at scale.

  • Organization: Alignerr
  • Type: Hourly Contract
  • Location: Remote
  • Commitment: 20–40 hours/week

What You'll Do

  • Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
  • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
  • Build and maintain evaluation harnesses that integrate with inference frameworks and benchmarking pipelines
  • Improve reliability, performance, and safety across existing Python codebases
  • Instrument systems with observability tooling and metrics collection to monitor model performance and system health
  • Identify bottlenecks and edge cases in data and system behavior, and implement scalable, maintainable fixes
  • Collaborate with data, research, and engineering teams through synchronous design reviews and async communication

Who You Are

  • Native or fluent English speaker with clear written and verbal communication skills
  • 3–5+ years of professional experience writing production-grade Python
  • Full-stack developer with a strong systems programming background
  • Experienced building evaluation harnesses for ML models and integrating with inference frameworks
  • Strong grasp of observability, metrics collection, and system reliability practices
  • Able to commit 20–40 hours per week with consistent availability

Nice to Have

  • Prior experience with data annotation pipelines, data quality systems, or model evaluation infrastructure
  • Familiarity with AI/ML workflows, model training, or benchmarking frameworks
  • Experience with distributed systems or internal developer tooling
  • Background working directly with AI labs or ML research teams

Why Join Us

  • Work on real production systems at the frontier of AI development alongside leading research labs
  • Fully remote and flexible — work from wherever you do your best work
  • Freelance autonomy with the structure of high-impact, technically challenging projects
  • Make a direct, measurable contribution to how next-generation AI models are evaluated and improved
  • Potential for ongoing work and contract extension as new projects launch
Originally posted on LinkedIn

Apply now

Please let the company know that you found this position on our job board. This is a great way to support us, so we can keep posting cool jobs every day!

RemoteITJobs.app logo

RemoteITJobs.app

Get RemoteITJobs.app on your phone!