Project details

Recommended projects

AI Agent Evaluation Analyst (m/f/d)

We are looking for an Freelance Agent Evaluation Analyst to take ownership of quality, structure, and insight across the project. This role goes far beyond task-checking - it’s about critical thinking, systems-level analysis, and ensuring clarity, reliability, and consistency at scale. You’ll work as both a hands-on evaluator and an analyst, collaborating with domain experts, delivery managers, and engineers. Beyond reviewing outputs, you’ll be expected to understand the “why” behind the work, identify logical gaps or inconsistencies, and propose meaningful improvements. This is a flexible, impact-driven role where you’ll have space to grow, contribute ideas, and help shape how evaluation and quality are scaled across the project. This role is especially well-suited for: Analysts, researchers, or consultants with strong structuring and reasoning skills Junior product managers or strategists curious about AI and evaluation work Smart problem-solvers (students or early-career professionals) who enjoy digging into logic, systems, and edge cases You do not need a coding background. What matters most is curiosity, intellectual rigor, and the ability to evaluate complex setups with precision. What you’ll be doing - Fully own the QA pipeline for agent evaluation tasks; - Review and validate tasks and golden paths created by scenario writers and experts; - Spot logical inconsistencies, vague requirements, hidden risks, and unrealistic assumptions; - Provide structured feedback and ensure quality alignment across contributors; Train, onboard, and mentor new QA team members; - Collaborate with domain experts, delivery managers, and engineers to improve test clarity and coverage; - Maintain and improve QA checklists, SOPs, and review guidelines; - Contribute to test planning, prioritization, and quality benchmarks; - Take initiative to suggest new approaches, tools, and processes that help scale validation and analysis.
AI Studio
Amsterdam, Netherlands
100% remote

AI Agent Evaluation Analyst

For an AI lab we are looking for AI Agent Evaluation Analyst to train an AI model (Large Language Model - LLM). You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities - Reviewing evaluation tasks and scenarios for logic, completeness, and realism. - Identifying inconsistencies, missing assumptions, or unclear decision points. - Helping define clear expected behaviors (gold standards) for AI agents. - Annotating cause-effect relationships, reasoning paths, and plausible alternatives. - Thinking through complex systems and policies as a human would to ensure agents are tested properly. - Working closely with QA, writers, or developers to suggest refinements or edge case coverage.
AI Lab
100% remote
New

AI Evaluation Consultant (m/w/d)

We are seeking an analytical and technically-minded professional to: - Evaluate AI outputs and processes - Ensure quality, accuracy, and reliability - Identify logical errors, risks, and structural inconsistencies - Provide actionable insights and recommendations to the team Ideal candidates: - Consultants, auditors, analysts, data researchers, or business/technical analysts with strong reasoning skills - Professionals curious about AI, process improvement, and quality evaluation - Problem-solvers who enjoy analyzing complex systems, logic, and scenarios Key Responsibilities: - Lead evaluation of AI outputs and related processes - Review tasks against expected/ideal scenarios; identify gaps and risks - Provide structured, actionable recommendations to engineers, domain experts, and managers - Maintain and improve evaluation guidelines, checklists, SOPs - Suggest new approaches, tools, and processes to enhance AI evaluation
AI Labs
100% remote

Freelance AI Consultant (Japanese) (m/f/d)

For our client we are looking for a Japanese speaking AI consultant: As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities. Responsibilities: - Carefully review provided data (text, images, or videos). - Review tasks submitted by the developer team and ensure quality assurance/quality control. - Label or classify content based on project guidelines. - Identify and flag factually incorrect, sensitive, inappropriate, or unclear material.
AI Studio
100% remote

Freelance AI Consultant (German) (m/w/d)

For our client, we are looking for a German-speaking AI consultant: As a consultant, you may be invited to take part in online projects to train the model in your area of expertise. This flexible role suits both experts seeking part-time work (minimum of a few hours per week) and those interested in full-time opportunities. Responsibilities: - Carefully review provided data (text, images, or videos). - Review tasks submitted by the development team and ensure quality assurance/quality control. - Label or classify content based on project guidelines. - Identify and flag factually incorrect, sensitive, inappropriate, or unclear material.
AI Studio
100% remote

Freelance Consultant - AI Training (Portugese-Speaking)

For an AI lab we are looking for a Portugese speaking freelance consultants to train an AI model (Large Language Model - LLM) in various domains: You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities Responsibilities: - Carefully review analyze provided data by AI in your domain of expertise. - Improve the model in your domain of expertise. - Review AI results and ensure quality assurance/quality control. - Label or classify content based on project guidelines.
AI Lab
100% remote

Freelance AI Consultant (Korean) (m/f/d)

For our client we are looking for a Korean speaking AI consultant: As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities. Responsibilities: - Carefully review provided data (text, images, or videos). - Review tasks submitted by the developer team and ensure quality assurance/quality control. - Label or classify content based on project guidelines. - Identify and flag factually incorrect, sensitive, inappropriate, or unclear material.
AI Studio
100% remote

Test Manager (m/f/d)

The development and quality assurance of the data layer includes its complete provisioning through the respective web application. The data layer forms the central data foundation for analyzing user behavior and for personalized content during the website visit. To increase reliability and stability, automated tests should be used to significantly reduce manual regression tests. For this task, a Test Automation Engineer with a focus on Playwright (Elastic) is needed. - Development and implementation of automated end-to-end tests with the npm package @elastic/synthetics (Playwright) for data layer tests. - Analysis of existing test processes, identification and prioritization of automation potentials. - Creation, maintenance, and optimization of test scripts considering current best practices. - Integration of automated tests into existing CI/CD pipelines (e.g., Jenkins, GitHub Actions) to enable continuous test automation. - Documentation of test cases, test results, and test coverage in tools like Jira and Confluence. - Advising stakeholders on the selection and introduction of appropriate test strategies, test tools, and frameworks. - Conducting code reviews for test automation scripts to improve quality and maintainability. - Preparing decision templates and recommendations for action to further develop test automation. - Providing advice on error analysis and resolution within test automation. - Consulting on setting up reports and alerting with Elastic Observability. - Promoting traceability and reproducibility of test results.
Telecommunications
Munich, Germany
100% remote

Freelance Automotive Engineer (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Mathematics Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance mathematics experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in mathematics contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for mathematics applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote

Freelance Chemistry Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance chemistry expert to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in chemistry contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for chemistry applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote

Freelance Physics Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance physics experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in physics contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for physics applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote

Freelance AI Consultant (Chinese) (m/f/d)

For our client we are looking for a Chinese-speaking AI consultant: As a consultant, you may be invited to take part in online projects to train the model in your area of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum a few hours/week) and those interested in full-time opportunities. Responsibilities: - Carefully review provided data (text, images, or videos). - Review tasks submitted by the developer team and ensure quality assurance/quality control. - Label or classify content based on project guidelines. - Identify and flag factually incorrect, sensitive, inappropriate, or unclear material.
AI Studio
100% remote

Freelance Civil Engineer with Python Experience (m/f/d)

A company is looking for a freelance Civil engineering experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in civil engineering contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. Key responsibilities: - Evaluate AI models for civil engineering applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote

Freelance Electrical Engineer with Python Experience (m/w/d)

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote
New

Freelance Mechanical Engineer with Python Experience (m/w/d)

For an AI lab we are looking for Mechanical Engineer with python experience to train an AI model (Large Language Model - LLM). GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Mechanical Engineering, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Kotlin Developer (m/w/d)

For an AI lab we are looking for Kotlin Developer to train an AI model (Large Language Model - LLM). You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities. - Code generation and code review - Prompt evaluation and complex data annotation - Training and evaluation of large language models - Benchmarking and agent-based code execution in sandboxed environments - Working across multiple programming languages - Adapting guidelines for new domains and use cases - Following project-specific rubrics and requirements - Collaborating with project leads, solution engineers, and supply managers on complex or experimental projects
AI Lab
100% remote

Freelance Rust Developer (m/w/d)

For an AI lab we are looking for Rust Developer to train an AI model (Large Language Model - LLM). You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities. - Code generation and code review - Prompt evaluation and complex data annotation - Training and evaluation of large language models - Benchmarking and agent-based code execution in sandboxed environments - Working across multiple programming languages (Python, JavaScript/TypeScript, Rust, SQL, etc.) - Adapting guidelines for new domains and use cases - Following project-specific rubrics and requirements - Collaborating with project leads, solution engineers, and supply managers on complex or experimental projects
AI Lab
100% remote

Freelance Physics Expert (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Chemist with Python Experience (m/f/d)

GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as an AI Tutor in Chemistry, you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Generate prompts that challenge AI. - Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. - Correct the model’s responses based on your domain-specific knowledge.
AI Lab
100% remote

Adobe Target Consultant (m/f/d)

The Digital Analytics department uses the Adobe Experience Cloud to implement personalized user experiences. The goal is to increase conversion rates and improve the customer experience through targeted personalization and testing. The technical implementation is carried out independently by specialized consultants. - Design and technical implementation of personalized user experiences with Adobe Target - Implementation of A/B and multivariate tests, including quality assurance in live environments - Integration of Adobe Target into complex system landscapes, including Adobe Experience Platform and Adobe Experience Manager - Development of targeting logic based on segments, real-time data, and user behavior - Advice on new features and best practices within the Adobe Experience Cloud - Preparation of technical decision templates for the development of the personalization strategy - Documentation and handover of developed solutions in Confluence and similar tools - Preparation of technical materials for data protection, especially data flow diagrams - Advice on GDPR-compliant use of Adobe Target and related systems - Conducting workshops to introduce Adobe Target
Telecommunications
100% remote

Frontend developer to HR platform with Angular experience

Reach out to us if you are interested in working with us on the project.
FRATCH
Munich
90% remote
Sign up to get access to more exciting projects that match your skills and preferences!

AI Agent Evaluation Analyst (m/f/d)

Sign up to view the number of applicants
Industry
Information Technology (IT)
Area
Quality Assurance (QA)

Project info

  • Period
    08.12.2025 - 04.04.2026
  • Capacity
    from 95%
  • Daily rate
    200 - 320€
  • Location
    Amsterdam, Netherlands
  • Languages
    Essential:
    • German
      (Advanced)
    Desirable:
    • English
      (Advanced)
  • Remote
    from 95%

Description

We are looking for an Freelance Agent Evaluation Analyst to take ownership of quality, structure, and insight across the project. This role goes far beyond task-checking - it’s about critical thinking, systems-level analysis, and ensuring clarity, reliability, and consistency at scale. You’ll work as both a hands-on evaluator and an analyst, collaborating with domain experts, delivery managers, and engineers. Beyond reviewing outputs, you’ll be expected to understand the “why” behind the work, identify logical gaps or inconsistencies, and propose meaningful improvements.

This is a flexible, impact-driven role where you’ll have space to grow, contribute ideas, and help shape how evaluation and quality are scaled across the project.

This role is especially well-suited for:

Analysts, researchers, or consultants with strong structuring and reasoning skills Junior product managers or strategists curious about AI and evaluation work Smart problem-solvers (students or early-career professionals) who enjoy digging into logic, systems, and edge cases

You do not need a coding background. What matters most is curiosity, intellectual rigor, and the ability to evaluate complex setups with precision.

What you’ll be doing

  • Fully own the QA pipeline for agent evaluation tasks;
  • Review and validate tasks and golden paths created by scenario writers and experts;
  • Spot logical inconsistencies, vague requirements, hidden risks, and unrealistic assumptions;
  • Provide structured feedback and ensure quality alignment across contributors; Train, onboard, and mentor new QA team members;
  • Collaborate with domain experts, delivery managers, and engineers to improve test clarity and coverage;
  • Maintain and improve QA checklists, SOPs, and review guidelines;
  • Contribute to test planning, prioritization, and quality benchmarks;
  • Take initiative to suggest new approaches, tools, and processes that help scale validation and analysis.

Requirements

What you should know / be able to do

  • Strong analytical and critical thinking skills;
  • Attention to detail and reliability - your work can be trusted without double-checking;
  • Experience in manual QA, scenario validation, or similar analytical work;
  • Comfortable working with structured formats (JSON/YAML);
  • Clear written communication and documentation skills;
  • Ability to give constructive feedback and coach others;
  • Capable of working with a wide range of stakeholders: from engineers to directors/VPs.

Nice to have

  • Background in scenario-based testing, test design, or annotation workflows;
  • Experience with AI/LLM evaluation, prompt validation, or agent behavior testing;
  • Some technical independence (e.g., Python skills);
  • Familiarity with MCP / tool-based task execution;
  • Experience working in cross-functional teams across product, delivery, and engineering.

Who you are

  • Detail-obsessed but also able to see the bigger picture;
  • Proactive, independent, and take true ownership of your work;
  • Strong communicator who can turn complex findings into actionable insights;
  • Flexible and motivated to contribute across a variety of tasks and projects;
  • Believe quality is not just checking work, but making the whole product better.