Note: The job is a remote job and is open to candidates in USA. Cohere is a company dedicated to scaling intelligence to serve humanity by training and deploying frontier models for AI systems. The Senior Research Scientist, Model Evaluation will be responsible for creating next-generation evaluation methods and infrastructure to measure large language model (LLM) progress, working cross-functionally to improve evaluation techniques.
Responsibilities
• Create ambitious new evaluation benchmarks that push the limits of what our models can accomplish.
• Work on highly cross-functional teams to translate model feedback into trustworthy, repeatable evaluations.
• Conduct research to advance the state-of-the-art in LLM evaluation methods, including training LLM judges; refining LLM-based data synthesis pipelines; and improving evaluation efficiency.
• Build scalable and reusable tools for digging into model performance.
Skills
• Create ambitious new evaluation benchmarks that push the limits of what our models can accomplish.
• Work on highly cross-functional teams to translate model feedback into trustworthy, repeatable evaluations.
• Conduct research to advance the state-of-the-art in LLM evaluation methods, including training LLM judges; refining LLM-based data synthesis pipelines; and improving evaluation efficiency.
• Build scalable and reusable tools for digging into model performance.
• You enjoy rapidly building prototypes that demonstrate the boundaries of what LLMs are capable of, and you have developed resources to measure those capabilities.
• You have spent dozens of hours reviewing complex data and LLM outputs to ensure high data quality.
• You are obsessive about rigorously measuring AI capabilities, and also about making sure your measurements actually align with the capabilities you care about.
• You have strong software engineering skills.
Benefits
• An open and inclusive culture and work environment
• Work closely with a team on the cutting edge of AI research
• Weekly lunch stipend, in-office lunches & snacks
• Full health and dental benefits, including a separate budget to take care of your mental health
• 100% Parental Leave top-up for up to 6 months
• Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
• Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
• 6 weeks of vacation (30 working days!)
Company Overview
• Cohere is an enterprise AI firm developing secure and private AI technology to address real-world business challenges. It was founded in 2019, and is headquartered in Toronto, Ontario, CAN, with a workforce of 201-500 employees. Its website is https://cohere.com.
Company H1B Sponsorship
• Cohere has a track record of offering H1B sponsorships, with 9 in 2025, 14 in 2024, 13 in 2023, 5 in 2022, 2 in 2021. Please note that this does not guarantee sponsorship for this specific role.
Apply Now
Apply Now