Python Insfrastructure Engineer - Model Evaluation

Alignerr

Department:Android Developer

Type:REMOTE

Region:USA

Location:New York, NY

Experience:Mid-Senior level

Estimated Salary:$80,000 - $120,000

Skills:

PYTHONFULL-STACK DEVELOPMENTSYSTEMS PROGRAMMINGMACHINE LEARNINGOBSERVABILITYMETRICS COLLECTIONDATA PIPELINESEVALUATION HARNESSESBACKEND SERVICESDISTRIBUTED SYSTEMS

Share this job:

Job Description

Posted on: May 26, 2026

Python Infrastructure Engineer — Model Evaluation (AI Training)About The Role What if your Python expertise could directly shape how the world's most advanced AI models are built, evaluated, and improved? We're looking for a Senior Python Infrastructure Engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that power next-generation AI systems at leading research labs. This is a fully remote contract role with serious technical depth — the kind of work that ships to production and influences model quality at scale.

Organization: Alignerr
Type: Hourly Contract
Location: Remote
Commitment: 20–40 hours/week

What You'll Do

Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
Build and maintain evaluation harnesses that integrate with inference frameworks and benchmarking pipelines
Improve reliability, performance, and safety across existing Python codebases
Instrument systems with observability tooling and metrics collection to monitor model performance and system health
Identify bottlenecks and edge cases in data and system behavior, and implement scalable, maintainable fixes
Collaborate with data, research, and engineering teams through synchronous design reviews and async communication

Who You Are

Native or fluent English speaker with clear written and verbal communication skills
3–5+ years of professional experience writing production-grade Python
Full-stack developer with a strong systems programming background
Experienced building evaluation harnesses for ML models and integrating with inference frameworks
Strong grasp of observability, metrics collection, and system reliability practices
Able to commit 20–40 hours per week with consistent availability

Nice to Have

Prior experience with data annotation pipelines, data quality systems, or model evaluation infrastructure
Familiarity with AI/ML workflows, model training, or benchmarking frameworks
Experience with distributed systems or internal developer tooling
Background working directly with AI labs or ML research teams

Why Join Us

Work on real production systems at the frontier of AI development alongside leading research labs
Fully remote and flexible — work from wherever you do your best work
Freelance autonomy with the structure of high-impact, technically challenging projects
Make a direct, measurable contribution to how next-generation AI models are evaluated and improved
Potential for ongoing work and contract extension as new projects launch

Originally posted on LinkedIn

Apply now

Please let the company know that you found this position on our job board. This is a great way to support us, so we can keep posting cool jobs every day!

Alignerr

View company page

RemoteITJobs.app

Get RemoteITJobs.app on your phone!

Get on Google Play Get on App Store

SIMILAR JOBS

Python Insfrastructure Engineer - Model Evaluation

Job Description

Apply now

Alignerr

RemoteITJobs.app

Frontend Developer (UI/UX) for NATO

Frontend Developer (UI/UX) for NATO

Senior Python Data Scraping Engineer (Freelance)

Senior Python Data Scraping Engineer (Freelance)

Senior Web Developer (6 month contract)

Senior Web Developer (6 month contract)

Python Insfrastructure Engineer - Model Evaluation

Python Insfrastructure Engineer - Model Evaluation

Senior Frontend Engineer

Senior Frontend Engineer