Position: PhD RaterType: Part-TimeCompensation: $50–$100/hourLocation: RemoteCommitment: 30+ hours/week (primarily weekdays) Role Responsibilities<ul><li>Design challenging, real-world STEM benchmark problems in domains such as data science, machine learning, finance, and software engineering.</li><li>Implement tasks within an agentic development environment using Python.</li><li>Create reproducible problem setups with clear specifications and executable tests.</li><li>Evaluate and analyze AI model behavior, including reasoning traces and agent workflows.</li><li>Diagnose reasoning failures, logic gaps, and problem-solving limitations in AI systems.</li><li>Contribute to improving benchmark quality and evaluation frameworks for frontier AI models. </li></ul> Requirements<ul><li>Active or recently graduated PhD.</li><li>Deep expertise in data science, machine learning, finance, and/or Python-based software development.</li><li>Strong research background in advanced STEM topics.</li><li>Ability to commit reliably for 30+ hours per week.</li><li>Demonstrated technical output such as high-quality open-source contributions or research work.</li><li>Ability to analyze agent behavior traces and diagnose failures beyond surface-level errors. </li></ul> Application Process<ul><li>Upload resume</li><li>Interview</li><li>Submit form</li></ul>

Remotehey

Work anywhere, Live anywhere

Data Scientist | Remote