Project details

Recommended projects

AI Agent Evaluation Analyst (m/f/d)

We are looking for an Freelance Agent Evaluation Analyst to take ownership of quality, structure, and insight across the project. This role goes far beyond task-checking - it’s about critical thinking, systems-level analysis, and ensuring clarity, reliability, and consistency at scale. You’ll work as both a hands-on evaluator and an analyst, collaborating with domain experts, delivery managers, and engineers. Beyond reviewing outputs, you’ll be expected to understand the “why” behind the work, identify logical gaps or inconsistencies, and propose meaningful improvements. This is a flexible, impact-driven role where you’ll have space to grow, contribute ideas, and help shape how evaluation and quality are scaled across the project. This role is especially well-suited for: Analysts, researchers, or consultants with strong structuring and reasoning skills Junior product managers or strategists curious about AI and evaluation work Smart problem-solvers (students or early-career professionals) who enjoy digging into logic, systems, and edge cases You do not need a coding background. What matters most is curiosity, intellectual rigor, and the ability to evaluate complex setups with precision. What you’ll be doing - Fully own the QA pipeline for agent evaluation tasks; - Review and validate tasks and golden paths created by scenario writers and experts; - Spot logical inconsistencies, vague requirements, hidden risks, and unrealistic assumptions; - Provide structured feedback and ensure quality alignment across contributors; Train, onboard, and mentor new QA team members; - Collaborate with domain experts, delivery managers, and engineers to improve test clarity and coverage; - Maintain and improve QA checklists, SOPs, and review guidelines; - Contribute to test planning, prioritization, and quality benchmarks; - Take initiative to suggest new approaches, tools, and processes that help scale validation and analysis.
AI Studio
Amsterdam, Netherlands
100% remote
New

Freelance Data Annotator QA (German)

For our client we are looking for a German speaking data annotation specialist: Annotation is what helps AI make sense of the world. As a QA Annotator, you may be invited to take part in online projects such as rating AI-generated content, evaluating factual accuracy, or comparing responses — when projects are available. Responsibilities: - Carefully review provided data (text, images, or videos). - Review tasks submitted by the Annotators team and ensure quality assurance/quality control. - Label or classify content based on project guidelines. - Identify and flag factually incorrect, sensitive, inappropriate, or unclear material.
AI Studio
100% remote

Freelance Annotator QA (Japanese)

For our client we are looking for a Japanese speaking data annotation specialist: Annotation is what helps AI make sense of the world. As a QA Annotator, you may be invited to take part in online projects such as rating AI-generated content, evaluating factual accuracy, or comparing responses — when projects are available. Responsibilities: - Carefully review provided data (text, images, or videos). - Review tasks submitted by the Annotators team and ensure quality assurance/quality control. - Label or classify content based on project guidelines. - Identify and flag factually incorrect, sensitive, inappropriate, or unclear material.
AI Studio
100% remote
New

Freelance Consultant - AI Training (Portugese-Speaking)

For an AI lab we are looking for a Portugese speaking freelance consultants to train an AI model (Large Language Model - LLM) in various domains: You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities Responsibilities: - Carefully review analyze provided data by AI in your domain of expertise. - Improve the model in your domain of expertise. - Review AI results and ensure quality assurance/quality control. - Label or classify content based on project guidelines.
AI Lab
100% remote
New

Freelance Data Annotator (Korean) (m/f/d)

For our client we are looking for a Korean speaking data annotation specialist: Annotation is what helps AI make sense of the world. As a QA Annotator, you may be invited to take part in online projects such as rating AI-generated content, evaluating factual accuracy, or comparing responses — when projects are available. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities Responsibilities: - Carefully review provided data (text, images, or videos). - Review tasks submitted by the Annotators team and ensure quality assurance/quality control. - Label or classify content based on project guidelines. - Identify and flag factually incorrect, sensitive, inappropriate, or unclear material.
AI Studio
100% remote

Freelance Automotive Engineer (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Electrical Engineer (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Mechanical Engineer (with Python) - Quality Assurance (AI Trainer)

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote
New

Freelance Mathematics Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance mathematics experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in mathematics contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for mathematics applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote
New

Freelance Chemistry Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance chemistry experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in chemistry contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for chemistry applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote
New

Freelance Physics Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance physics experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in physics contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for physics applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote
New

Freelance Electrical Engineer for AI Model Training (m/f/d)

A company is looking for a freelance electrical engineering expert to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in electrical engineering contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. Key responsibilities: - Evaluate AI models for electrical engineering applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote
New

Freelance Mechanical Engineer for AI Model Training (m/f/d)

A company is looking for a freelance mechanical engineering experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in mechanical engineering contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. Key responsibilities: - Evaluate AI models for mechanical engineering applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote
New

Freelance Civil Engineer for AI Model Training (m/f/d)

A company is looking for a freelance Civil engineering experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in civil engineering contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. Key responsibilities: - Evaluate AI models for civil engineering applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote
New

Freelance Data Annotator (Chinese) (m/f/d)

For an AI studio we are looking for a Chinese speaking data annotation specialist: Annotation is what helps AI make sense of the world. As a QA Annotator, you may be invited to take part in online projects such as rating AI-generated content, evaluating factual accuracy, or comparing responses — when projects are available. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities Responsibilities: - Carefully review provided data (text, images, or videos). - Review tasks submitted by the Annotators team and ensure quality assurance/quality control. - Label or classify content based on project guidelines. - Identify and flag factually incorrect, sensitive, inappropriate, or unclear material.
AI Studio
100% remote

Head of Electronic Development (m/f/d)

A company is looking for an experienced Head of Electronic Development who will take on the technical and disciplinary leadership of a hardware and software development team. The goal of the project is to ensure the quality of technical deliverables, plan and execute strategic research and development, and optimize existing processes and systems. The role includes participating in agile project management, cost estimation and proposal creation, as well as team and capacity planning. The candidate will be responsible for the professional development of team members and will communicate closely with customers and suppliers. Key tasks: - Technical and disciplinary leadership of a HW+SW development team - Participation in agile project management, cost estimation and proposal creation - Team and capacity planning - Professional development of team members - Responsibility for the quality of formal and technical deliverables - Cost calculation and pricing of products, especially PCBA - Planning and execution of strategic research and development - Communication with customers and suppliers - Optimization of existing processes and systems
Manufacturing
Frankfurt, Germany
100% remote

Freelance Physics Expert (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote
New

AI Trainer for Vibe Coding (m/w/d)

An AI Lab is looking for a AI Trainer for Vibe Coding. This role involves producing accurate, well-reasoned outputs across diverse domains, leveraging automation and AI tools. The position requires expertise in coding and optimizing Python scripts, handling large datasets, improving AI-generated content, and formatting and troubleshooting technical workflows. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Develop and optimize Python scripts for automation and AI tasks. - Handle and analyze large datasets efficiently. - Improve and refine AI-generated content for accuracy and quality. - Format and troubleshoot technical workflows to ensure smooth operations. - Collaborate with cross-functional teams to enhance AI tools and processes.
AI Lab
100% remote

Technical Lead AI Training

A company is looking for a Technical Lead to oversee AI training projects. The role involves acting as a team lead and reviewer, ensuring technical quality across both frontend and backend development. The ideal candidate will contribute to the project's success by reviewing and approving tasks, mentoring junior developers, managing technical debt, and ensuring coherence between backend and frontend systems. The goal is to keep the project aligned with product objectives while maintaining high technical standards. Key responsibilities: - Act as team lead and reviewer for technical quality across frontend and backend. - Decompose features into tasks and guide junior developers. - Review and approve tasks to ensure alignment with product goals. - Mentor junior developers and foster their growth. - Manage technical debt and ensure coherence between backend and frontend systems. - Collaborate on system design, architecture, and code reviews.
AI Labelling Company
100% remote
New

Freelance AI Trainer - Writers (English) (m/w/d)

An AI Lab is seeking professionals experienced in working with tests to join their innovative team as English AI Trainers. The role involves crafting and editing texts, as well as evaluating AI-generated replies to ensure quality and accuracy. This position is ideal for individuals with expertise in writing, editing, and analyzing content, particularly in the context of AI and language models. As part of a cutting-edge AI lab, you will contribute to the development and refinement of advanced AI systems. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Craft and edit high-quality texts tailored to specific requirements. - Evaluate and analyze AI-generated replies for accuracy, relevance, and quality. - Collaborate with teams to improve AI language models and content generation processes. - Provide feedback and suggestions to enhance AI performance. - Conduct research to ensure content aligns with industry standards and user expectations.
AI Lab
100% remote

Project Manager/Account Manager (Senior to Director) (m/f/d)

A company is looking for experienced project managers and account managers who can execute international campaigns on a hands-on basis. The role requires close collaboration with different markets and stakeholders to ensure campaigns are carried out effectively and on time. The position is ideal for candidates with extensive agency experience, especially in implementing traditional campaigns. Experience at renowned agencies like Media Monks or similar is a plus. Main responsibilities: - Planning, execution and monitoring of international campaigns - Close collaboration with international markets and stakeholders - Ensuring adherence to schedules and budgets - Coordination of teams and resources - Reporting and analysis of campaign results
Agency
Munich, Germany
100% remote

Frontend developer to HR platform with Angular experience

Reach out to us if you are interested in working with us on the project.
FRATCH
Munich
90% remote
Sign up to get access to more exciting projects that match your skills and preferences!

AI Agent Evaluation Analyst (m/f/d)

Sign up to view the number of applicants
Industry
Information Technology (IT)
Area
Quality Assurance (QA)

Project info

  • Period
    10.11.2025 - 07.03.2026
  • Capacity
    from 95%
  • Daily rate
    200 - 320€
  • Location
    Amsterdam, Netherlands
  • Languages
    Essential:
    • German
      (Advanced)
    Desirable:
    • English
      (Advanced)
  • Remote
    from 95%

Description

We are looking for an Freelance Agent Evaluation Analyst to take ownership of quality, structure, and insight across the project. This role goes far beyond task-checking - it’s about critical thinking, systems-level analysis, and ensuring clarity, reliability, and consistency at scale. You’ll work as both a hands-on evaluator and an analyst, collaborating with domain experts, delivery managers, and engineers. Beyond reviewing outputs, you’ll be expected to understand the “why” behind the work, identify logical gaps or inconsistencies, and propose meaningful improvements.

This is a flexible, impact-driven role where you’ll have space to grow, contribute ideas, and help shape how evaluation and quality are scaled across the project.

This role is especially well-suited for:

Analysts, researchers, or consultants with strong structuring and reasoning skills Junior product managers or strategists curious about AI and evaluation work Smart problem-solvers (students or early-career professionals) who enjoy digging into logic, systems, and edge cases

You do not need a coding background. What matters most is curiosity, intellectual rigor, and the ability to evaluate complex setups with precision.

What you’ll be doing

  • Fully own the QA pipeline for agent evaluation tasks;
  • Review and validate tasks and golden paths created by scenario writers and experts;
  • Spot logical inconsistencies, vague requirements, hidden risks, and unrealistic assumptions;
  • Provide structured feedback and ensure quality alignment across contributors; Train, onboard, and mentor new QA team members;
  • Collaborate with domain experts, delivery managers, and engineers to improve test clarity and coverage;
  • Maintain and improve QA checklists, SOPs, and review guidelines;
  • Contribute to test planning, prioritization, and quality benchmarks;
  • Take initiative to suggest new approaches, tools, and processes that help scale validation and analysis.

Requirements

What you should know / be able to do

  • Strong analytical and critical thinking skills;
  • Attention to detail and reliability - your work can be trusted without double-checking;
  • Experience in manual QA, scenario validation, or similar analytical work;
  • Comfortable working with structured formats (JSON/YAML);
  • Clear written communication and documentation skills;
  • Ability to give constructive feedback and coach others;
  • Capable of working with a wide range of stakeholders: from engineers to directors/VPs.

Nice to have

  • Background in scenario-based testing, test design, or annotation workflows;
  • Experience with AI/LLM evaluation, prompt validation, or agent behavior testing;
  • Some technical independence (e.g., Python skills);
  • Familiarity with MCP / tool-based task execution;
  • Experience working in cross-functional teams across product, delivery, and engineering.

Who you are

  • Detail-obsessed but also able to see the bigger picture;
  • Proactive, independent, and take true ownership of your work;
  • Strong communicator who can turn complex findings into actionable insights;
  • Flexible and motivated to contribute across a variety of tasks and projects;
  • Believe quality is not just checking work, but making the whole product better.