Project details
Recommended projects
AI Evaluation Consultant (m/w/d)
Evaluation Scenario Writer (m/w/d)
Freelance Automotive Engineer (with Python) - Quality Assurance / AI Trainer
Freelance Mechanical Engineer with Python Experience (m/f/d)
AI Consultant - Machine Learning (m/w/d)
Freelance Product Manager for Android App (m/f/d)
Freelance Product Owner for Point of Sale App
Senior Project Manager Customer Interaction
AI Consultants - Data Science (m/w/d)
Freelance Cybersecurity Consultant for AI Red Teaming
ERP Transformation Manager (m/f/d)
Infor AS Consultant (m/f/d)
Commissioning & Qualification (C&Q) Engineer (m/w/d)
IT Project Manager ISO 27.001 - Gap Closure (m/f/d)
Developer for Consent Management Implementation (m/f/d)
Cyber Risk Consulting (Senior Level)
AI Consultant for Vibe Coding (m/w/d)
Magazine Production Project Manager (m/f/d)
IT Project Manager ServiceNow (Senior)
Java IT Architect (m/f/d)
HSE Specialist – Facilities (M/F/D)
HSE Specialist – Cell Manufacturing
Tax Strategy Consulting
Safety and Health Protection Coordinator (SiGeKo) and Safety Specialist (SiFa) (m/f/d)
Management Consultant (Senior Level) (m/f/d)
Control System Technician / Control Systems Specialist (m/f/d)
TM1 Planning Analytics and Interfaces Development (m/f/d)
Cyber Security Consultant – Product Security & Regulatory Compliance (m/f/d)
Senior Cloud Developer TypeScript (m/f/d)
Data Engineer (m/f/d)
Frontend developer to HR platform with Angular experience
Time's up! We are no longer accepting applications.
AI Agent Evaluation Analyst
Project info
- Period19.01.2026 - 18.03.2026
- Capacityfrom 5%
- Daily rate120 - 360€
- Language
- English(Advanced)
- English
- Remotefrom 95%
Description
For an AI lab we are looking for AI Agent Evaluation Analyst to train an AI model (Large Language Model - LLM).
You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise.
This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities
- Reviewing evaluation tasks and scenarios for logic, completeness, and realism.
- Identifying inconsistencies, missing assumptions, or unclear decision points.
- Helping define clear expected behaviors (gold standards) for AI agents.
- Annotating cause-effect relationships, reasoning paths, and plausible alternatives.
- Thinking through complex systems and policies as a human would to ensure agents are tested properly.
- Working closely with QA, writers, or developers to suggest refinements or edge case coverage.
Requirements
- Excellent analytical thinking: Can reason about complex systems, scenarios, and logical implications.
- Strong attention to detail: Can spot contradictions, ambiguities, and vague requirements.
- Familiarity with structured data formats: Can read, not necessarily write JSON/YAML.
- Can assess scenarios holistically: What's missing, what’s unrealistic, what might break?
- Experience with policy evaluation, logic puzzles, case studies, or structured scenario design.
- Background in consulting, academia, olympiads (e.g. logic/math/informatics), or research.
- Exposure to LLMs, prompt engineering, or AI-generated content.
- Familiarity with QA or test-case thinking (edge cases, failure modes, “what could go wrong”).
Application Process:
- If you are being selected you will by invited to an interview by Mindrift.