This is a flexible, impact-driven role where you’ll have space to grow, contribute ideas, and help shape how evaluation and quality are scaled across the project.

This role is especially well-suited for:

Analysts, researchers, or consultants with strong structuring and reasoning skills Junior product managers or strategists curious about AI and evaluation work Smart problem-solvers (students or early-career professionals) who enjoy digging into logic, systems, and edge cases

You do not need a coding background. What matters most is curiosity, intellectual rigor, and the ability to evaluate complex setups with precision.

What you’ll be doing

Fully own the QA pipeline for agent evaluation tasks;
Review and validate tasks and golden paths created by scenario writers and experts;
Spot logical inconsistencies, vague requirements, hidden risks, and unrealistic assumptions;
Provide structured feedback and ensure quality alignment across contributors; Train, onboard, and mentor new QA team members;
Collaborate with domain experts, delivery managers, and engineers to improve test clarity and coverage;
Maintain and improve QA checklists, SOPs, and review guidelines;
Contribute to test planning, prioritization, and quality benchmarks;
Take initiative to suggest new approaches, tools, and processes that help scale validation and analysis.

What you should know / be able to do

Strong analytical and critical thinking skills;
Attention to detail and reliability - your work can be trusted without double-checking;
Experience in manual QA, scenario validation, or similar analytical work;
Comfortable working with structured formats (JSON/YAML);
Clear written communication and documentation skills;
Ability to give constructive feedback and coach others;
Capable of working with a wide range of stakeholders: from engineers to directors/VPs.

Nice to have

Background in scenario-based testing, test design, or annotation workflows;
Experience with AI/LLM evaluation, prompt validation, or agent behavior testing;
Some technical independence (e.g., Python skills);
Familiarity with MCP / tool-based task execution;
Experience working in cross-functional teams across product, delivery, and engineering.

Who you are

Detail-obsessed but also able to see the bigger picture;
Proactive, independent, and take true ownership of your work;
Strong communicator who can turn complex findings into actionable insights;
Flexible and motivated to contribute across a variety of tasks and projects;
Believe quality is not just checking work, but making the whole product better.

Project details

Recommended projects

AI Agent Evaluation Analyst (m/f/d)

AI Agent Evaluation Analyst

AI Evaluation Consultant (m/w/d)

Business Analyst – SAP S/4HANA Output Management (m/f/d)

Freelance Automotive Engineer (with Python) - Quality Assurance / AI Trainer

Freelance Chemistry Expert for AI Model Training (m/f/d)

Freelance Electrical Engineer with Python Experience (m/w/d)

Freelance Mechanical Engineer with Python Experience (m/w/d)

Freelance Civil Engineer with Python Experience (m/f/d)

Senior Project Manager Customer Interaction

Freelance Physics Expert (with Python) - Quality Assurance / AI Trainer

Freelance Statistics Expert with Python Experience (m/f/d)

Expert for Setting Up a Call Center

Chemist with Python Experience (m/f/d)

Physicist with Python Experience (m/w/d)

Project Manager Magazines / Magazine Production (m/f/d)

Developer for Consent Management Implementation (m/f/d)

AI Consultant - Machine Learning (m/w/d)

Freelance AI Trainer - Writers (English) (m/f/d)

Mathematician with Python Experience (m/w/d)

Biologist with Python Experience (m/w/d)

ERP-Transformation Manager (m/w/d)

Freelance Ruby Developer (m/f/d)

Project Manager Brand Guardianship (m/f/d)

IT Project Manager ServiceNow (Senior)

Freelance Cybersecurity Consultant for AI Red Teaming

Product Manager POS / Cash Register Systems (m/f/d)

AI Consultant for Vibe Coding (m/w/d)

AI Consultants - Data Science (m/w/d)

Freelance Biology Expert for AI Model Training (m/f/d)

Fullstack Engineer (m/f/d)

Frontend developer to HR platform with Angular experience

AI Agent Evaluation Analyst (m/f/d)

Project info

Description

Requirements