Software Engineer – Code Review & LLM Evaluation [1 Month Contract]

Work Type: Contractor | Permanent Remote

Compensation: USD 50 – 125/hour

Hours: 10 to 40 hours/week (Partial PST overlap required)

Experience Required: 5 – 10 Years

Contract Duration: 1 Month (Extension based on performance)

Notice Period: Immediate preferred

Note

This is a contract-based, fully remote role.
Only citizens or valid work permit holders from the US, Canada, Australia, or Western Europe are eligible.
No medical benefits or paid leave.
Contractors must manage their own taxes and compliance.
Payment is based on actual hours worked.

Job Overview

We're seeking skilled Fullstack Engineers to join cutting-edge AI projects focused on enhancing the performance of Large Language Models (LLMs) in real-world software tasks. Your work will help train, test, and validate LLMs by evaluating AI-generated code, contributing to agent-based applications, and collaborating with a high-performance team of engineers and researchers.

You will play a key role in building datasets, testing model responses, and pushing AI systems closer to real developer productivity.

Key Responsibilities

Contribute to LLM-focused projects that evaluate AI performance on realistic software engineering tasks
Build and lead agent use cases such as coding copilots, creative tools, or automation bots
Review and rank 3–4 AI-generated code solutions per task using a structured framework
Analyze code diffs for accuracy, efficiency, and readability
Construct fullstack tools for data pipeline support and internal testing environments
Identify and report edge cases in model outputs, providing well-structured rationale
Work closely with researchers and engineers to improve model behavior and code quality

Must-Have Skills

5+ years of hands-on software engineering experience, including strong fullstack development
1+ years full-time (FTE only) at a top 50 tech company if US-based, 2+ years if located outside the US
Deep knowledge of software design, code review, debugging, and scalable systems
Proven expertise in building production-grade applications using modern frameworks
Excellent communication skills for writing clear evaluation rationales
Proficient in Git workflows, JavaScript/Python, and cloud platforms (AWS, GCP, etc.)

Remotehey

Work anywhere, Live anywhere

Software Engineer – Code Review & LLM Evaluation [1 Month Contract]