Company: Mercor.
Type: Contract (Full-time or Part-time).
Location: Remote (Worldwide).
Language: Professional English required.
Compensation:
- USD $30 – $90/hour (depending on experience & evaluation performance).
- Weekly payments via Stripe or Wise.
- Flexible workload (project-based, scalable hours).
Mission:
- Work directly with leading AI teams to improve how large language models reason about code, systems design, and technical problem-solving.
- You will evaluate and refine AI-generated responses, making them more accurate, reliable, and aligned with real-world engineering standards.
Responsibilities:
- Evaluate AI-generated answers to coding and system design problems.
- Execute and validate code outputs.
- Identify bugs, inefficiencies, and incorrect reasoning.
- Assess code quality & readability.
- Assess algorithmic correctness.
- Assess system design logic.
- Annotate responses with structured, actionable feedback.
- Follow defined evaluation frameworks and quality benchmarks.
Required Skills:
Core:
- Swift (expert level).
- Software Engineering (5+ years).
- Data Structures & Algorithms.
- Systems Design.
- Debugging & Code Review.
- Problem Solving (Medium–Hard level).
Technical:
- Code Execution & Testing.
- API Design & Backend Logic.
- Performance Optimization.
- Version Control (Git).
AI / Evaluation Context:
- Experience using LLMs in development workflows.
- Ability to evaluate reasoning, not just outputs.
Nice-to-Have Skills:
- RLHF / AI Model Evaluation.
- Competitive Programming.
- Open-source contributions (merged PRs).
- Multi-language experience (Python, JS, etc.).
- Technical writing / explaining complex concepts.
Ideal Candidate:
- Degree in Computer Science or related field (BS/MS/PhD).
- Strong real-world engineering background.
- Detail-oriented and highly analytical.
- Comfortable identifying subtle logic flaws and edge cases.
- Able to work independently in async environments.
What You Will Achieve:
- Improve the quality and reasoning of AI-generated code.
- Influence how AI systems assist developers globally.
- Deliver high-quality evaluation outputs that directly impact model performance.
Location: Remote - Anywhere
Skills required for this job:
• AI model evaluation
• API design
• Algorithm development
• Code review
• Data science
• Data structures
• Debugging
• Git
• JavaScript
• Large language model (LLM)
• Performance optimization
• Python
• Reinforcement learning from human feedback (RLHF)
• Software engineering
• Swift
• Systems design
• Technical writing
• Testing