Project details

Recommended projects

New

AI Agent Evaluation Analyst (m/w/d)

We’re on the hunt for QAs for autonomous AI agents for a new project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout the project, you’ll have to balance quality assurance, research, and logical problem-solving. This project opportunity is ideal for people who enjoy looking at systems holistically and thinking through scenarios, implications, and edge cases. You do not need a coding background, but you must be curious, intellectually rigorous, and capable of evaluating the soundness and consistency of complex setups. If you’ve ever excelled in things like consulting, CHGK, Olympiads, case solving, or systems thinking — you might be a great fit. What you’ll be doing: - Reviewing evaluation tasks and scenarios for logic, completeness, and realism. - Identifying inconsistencies, missing assumptions, or unclear decision points. - Helping define clear expected behaviors (gold standards) for AI agents. - Annotating cause-effect relationships, reasoning paths, and plausible alternatives. - Thinking through complex systems and policies as a human would to ensure agents are tested properly. - Working closely with QA, writers, or developers to suggest refinements or edge case coverage.
100% remote
New

MCP & Tools Python Developer (m/w/d)

We’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing: - Developing and maintaining MCP-compatible evaluation servers - Implementing logic to check agent actions against scenario definitions - Creating or extending tools that writers and QAs use to test agents - Working closely with infrastructure engineers to ensure compatibility - Occasionally helping with test writing or debug sessions when needed Although we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects.
100% remote
New

Senior Data Architect (m/f/d)

We are currently looking for a senior data architect (m/f/d) with experience in international industrial data platform and cloud projects for one of our top customers. Objectives - Achieve high customer satisfaction through reliable delivery, clear communication, and measurable outcomes. - Serve as the link among business, product, engineering, security, compliance, legal, and operations; facilitate informed trade-offs. - Establish and evolve long-term architecture with a balance of security, privacy, performance, cost, resilience, and interoperability. - Enable data-driven decision-making by optimizing data systems for structure, integration, and compliance. Responsibilities - Design and implement data models for efficient storage, retrieval, and analysis at enterprise and application levels. - Organize data at macro level (domains, canonical models, sharing policies) and micro level (logical/physical models); provide golden-source logical models and business rules for data quality. - Identify and document requirements decisive for long-term architecture; develop and document technology, structure, and implementation decisions based on best practices. - Verify day-to-day compliance with architectural decisions; establish quality assurance measures (design reviews, automated checks, guardrails). - Monitor data quality and integrity; ensure compliance with GDPR and security standards; explain decisions and strategies; coach teams. Key tasks and activities - Architecture design and stewardship: own end-to-end architecture considering scalability, performance, reliability, and cost; shape and steer the data landscape beyond implementation. - Data modeling: create conceptual, logical, and physical models; apply Data Vault 2.0, dimensional modeling (Kimball star schemas), and 3NF as appropriate. - Technology selection: evaluate and select data technologies, ETL/ELT tools, and cloud services within the existing IT landscape; develop concepts and technology proposals. - Governance and quality: define standards for data quality, metadata, lineage, and access; embed guardrails in data pipelines; ensure GDPR compliance. - Stakeholder management: act as the technical link between data engineers, data scientists, and business stakeholders; evaluate and communicate architectures and trade-offs. - Documentation: produce durable architecture documentation, including data flow diagrams, interface descriptions, domain maps, and decision logs.
100% remote

Freelance Mechanical Engineer with Python Experience (m/f/d)

For an AI lab we are looking for Mechanical Engineer with python experience to train an AI model (Large Language Model - LLM). GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Mechanical Engineering, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Cybersecurity Consultant for AI Red Teaming

For an AI lab we are looking for cybersecurity consultants to train an AI model (Large Language Model - LLM). You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities - Evaluate and red team AI models and agents and machine learning systems for vulnerabilities and safety risks. - Create offline reproducible & auto-evaluable test cases to test safety & capability of AI agents. - Develop and implement automation scripts, custom tools, environments and test harnesses. - Lead or contribute to security research initiatives, especially in AI safety, creating and implementing realistic and challenging attack scenarios for the model. - Advise on cybersecurity best practices and policy implications.
AI Lab
100% remote

AI Evaluation Consultant (m/w/d)

We are seeking an analytical and technically-minded professional to: - Evaluate AI outputs and processes - Ensure quality, accuracy, and reliability - Identify logical errors, risks, and structural inconsistencies - Provide actionable insights and recommendations to the team Ideal candidates: - Consultants, auditors, analysts, data researchers, or business/technical analysts with strong reasoning skills - Professionals curious about AI, process improvement, and quality evaluation - Problem-solvers who enjoy analyzing complex systems, logic, and scenarios Key Responsibilities: - Lead evaluation of AI outputs and related processes - Review tasks against expected/ideal scenarios; identify gaps and risks - Provide structured, actionable recommendations to engineers, domain experts, and managers - Maintain and improve evaluation guidelines, checklists, SOPs - Suggest new approaches, tools, and processes to enhance AI evaluation
AI Labs
100% remote

Freelance Electrical Engineer with Python Experience (m/w/d)

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Automotive Engineer (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Physics Expert (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Java Developer (all genders)

For an AI lab, we’re looking for a Java Developer to train an AI model (large language model - LLM). You’ll help AI make sense of the world. As a consultant, you may be invited to join online projects to train the model in your area of expertise. This flexible role suits both experts seeking part-time work (minimum a few hours per week) and those interested in full-time opportunities. - Code generation and code review - Prompt evaluation and complex data annotation - Training and evaluation of large language models - Benchmarking and agent-based code execution in sandboxed environments - Working across multiple programming languages - Adapting guidelines for new domains and use cases - Following project-specific rubrics and requirements - Collaborating with project leads, solution engineers, and supply managers on complex or experimental projects
AI Lab
100% remote

Freelance Ruby Developer (m/f/d)

For an AI lab we are looking for a Ruby Developer to train an AI model (Large Language Model - LLM). You help AI make sense of the world. As a consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (at least a few hours per week) and those interested in full-time opportunities. - Code generation and code review - Prompt evaluation and complex data annotation - Training and evaluation of large language models - Benchmarking and agent-based code execution in sandboxed environments - Working across multiple programming languages (Python, JavaScript/TypeScript, Rust, SQL, etc.) - Adapting guidelines for new domains and use cases - Collaborating with project leads, solution engineers, and supply managers on complex or experimental projects
AI Lab
100% remote

Freelance Biology Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance biology experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in biology (all areas) contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for biology applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote

Freelance Chemistry Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance chemistry experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in chemistry contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for chemistry applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote
New

Evaluation Scenario Writer (m/w/d)

We’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically: - Designing structured test scenarios based on real-world tasks. - Defining the golden path and acceptable agent behavior. - Annotating task steps, expected outputs, and edge cases. - Working with devs to test your scenarios and improve clarity. - Reviewing agent outputs and adapting tests accordingly
100% remote

AI Consultant - Machine Learning (m/w/d)

For an AI lab we are looking for Machine learning experts to train an AI model (Large Language Model - LLM). GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Design original computational STEM problems that simulate real scientific workflows - Create problems that require Python programming to solve - Ensure problems are computationally intensive and cannot be solved manually within reasonable timeframes (days/weeks) - Develop problems requiring non-trivial reasoning chains and creative problem-solving approaches - Verify solutions using Python with standard libraries (numpy, pandas, scipy, sklearn) - Document problem statements clearly and provide verified correct answers
AI Lab
100% remote

AI Consultant for Vibe Coding (m/w/d)

An AI Lab is looking for a AI Trainer for Vibe Coding. This role involves producing accurate, well-reasoned outputs across diverse domains, leveraging automation and AI tools. The position requires expertise in coding and optimizing Python scripts, handling large datasets, improving AI-generated content, and formatting and troubleshooting technical workflows. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Conduct advanced web research and data mining using multiple tools to locate and extract information from official sources. Use LLMs and advanced prompts to refine search strategies and validate data accuracy by cross-referencing authoritative sources. - Perform web scraping and data extraction by navigating complex website structures and multi-level pages (regions → companies → detailed pages). Handle dynamic content, archived pages, and various HTML formats, and organize extracted data into clean, well-formatted CSV files. - Write and optimize Python scripts for data processing and analysis using libraries such as pandas, BeautifulSoup, Selenium, and matplotlib. Transform raw data into structured formats (CSV, JSON, tables) and create visualizations when required. - Carry out data processing and quality assurance by cleaning, validating, and structuring datasets. - - Ensure data integrity across multiple sources, apply formatting specifications, and run verification steps to maintain high output quality. - Apply strong problem-solving and task execution skills to break down complex workflows, troubleshoot technical issues independently, and adapt quickly between different domains and task types with minimal supervision. - Produce clear documentation and high-quality outputs that follow exact requirements for file formats, naming conventions, and data structure. Maintain reproducible workflows and well-organized code.
AI Lab
100% remote

Freelance Statistics Expert with Python Experience (m/f/d)

For an AI lab we are looking for Statistics Expert with python experience to train an AI model (Large Language Model - LLM). GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Generate prompts that challenge AI. - Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. - Correct the model’s responses based on your domain-specific knowledge.
AI Lab
100% remote

Freelance Civil Engineer with Python Experience (m/f/d)

A company is looking for a freelance Civil engineering experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in civil engineering contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. Key responsibilities: - Evaluate AI models for civil engineering applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote

Dentist for Training AI Models (m/f/d)

For an AI lab we are looking for German-speaking dentists to train an AI model (Large Language Model - LLM). As a consultant, you may be invited to take part in online projects to train the model in your area of expertise. This flexible role works for both experts who want part-time work (at least a few hours a week) and those interested in full-time roles. Although every project is unique, you might typically: - Collaborate with the AI lab to share your domain knowledge in dentistry. - Join online training sessions to improve the AI model's understanding. - Review and validate AI-generated content for accuracy and relevance. - Offer insights and feedback to boost the model's performance. - Take on flexible project-based tasks, adapting to each project's needs.
AI Lab
100% remote

AI Consultants - Data Science (m/w/d)

We are seeking experienced data scientists to create computationally intensive data science problems for an advanced AI evaluation project. This is a remote, project-based opportunity for experts who can design challenging problems that require computational methods to solve and mirror the full data science lifecycle - from data acquisition and processing to statistical analysis and actionable business insights. What You'll Do - Design original computational data science problems that simulate real-world analytical workflows across industries (telecom, finance, government, e-commerce, healthcare) Create problems requiring Python programming to solve (using pandas, numpy, scipy, sklearn, statsmodels, matplotlib, seaborn) - Ensure problems are computationally intensive and cannot be solved manually within reasonable timeframes (days/weeks) - Develop problems requiring non-trivial reasoning chains in data processing, statistical analysis, feature engineering, predictive modeling, and insight extraction - Create deterministic problems with reproducible answers - avoid stochastic elements or require fixed random seeds for exact reproducibility - Base problems on real business challenges: customer analytics, risk assessment, fraud detection, forecasting, optimization, and operational efficiency - Design end-to-end problems spanning the complete data science pipeline (data ingestion → cleaning → EDA → modeling → validation → deployment considerations) - Incorporate big data processing scenarios requiring scalable computational approaches - Verify solutions using Python with standard data science libraries and statistical methods - Document problem statements clearly with realistic business contexts and provide verified correct answers
AI Lab
Munich, Germany
100% remote
New

Data Engineer (m/f/d)

A company is looking for an experienced Data Engineer to carry out a migration from Snowflake to ClickHouse. The focus is on using Apache Spark for data processing and on managing and optimizing Kubernetes environments. The goal is to build and operate a powerful and scalable data platform. - Executing the migration from Snowflake to ClickHouse - Developing and optimizing data pipelines with Apache Spark - Managing and optimizing Kubernetes clusters - Ensuring the performance and scalability of the data platform - Implementing solutions in Python - Optional: Working with Snowplow for data analytics
Media / Publishing
Munich, Germany
100% remote
New

SAP FI/CO Consultant (m/f/d) – Focus SAP R/3 - S/4HANA Transition

For an industrial company we are looking for an experienced SAP FI/CO Senior Consultant (m/f/d) with a strong focus on SAP R/3 (ECC) and proven experience in S/4HANA transition projects. The goal of the project is the functional and system-side analysis of the existing R/3 processes in the Finance and Controlling area and support in preparing and implementing the migration to S/4HANA. Responsibilities - Analysis of existing FI/CO processes in SAP R/3 (general ledger, accounts receivable/payable, asset accounting, CO objects, CO PA, etc.) - Conducting a gap analysis between SAP ECC and S/4HANA - Assessing the impact of new S/4 features (Universal Journal, Business Partner, new asset accounting etc.) - Identifying optimization potentials & recommendations for process standardization - Creating a preparation and migration plan (Delivery: end of January) - Running a remote workshop with the business units - Advising on FI/CO best practices in industrial environments - Preparing professional concept and documentation materials
100% remote

Mathematician with Python Experience (m/w/d)

For an AI lab we are looking for mathematicians with python experience to train an AI model (Large Language Model - LLM). As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities. Although every project is unique, you might typically: - Design original computational mathematics problems that simulate real mathematical research workflows. - Create problems requiring Python programming to solve (using numpy, scipy, sympy). - Ensure problems are computationally intensive and cannot be solved manually within reasonable timeframes (days/weeks). - Develop problems requiring non-trivial reasoning chains in areas like number theory, combinatorics, graph theory, and numerical analysis. - Base problems on real research challenges or practical applications from mathematical practice. - Verify solutions using Python with standard mathematical libraries. - Document problem statements clearly and provide verified correct answers. Support in: - Number Theory: Prime factorization, Diophantine equations, modular arithmetic, cryptographic computations. - Combinatorics: Enumerations, partitions, generating functions, combinatorial optimization. - Graph Theory: Network analysis, path finding, graph coloring, spanning trees. - Numerical Analysis: Root finding, numerical integration, differential equations, matrix computations. - Discrete Mathematics: Recurrence relations, algorithmic complexity, discrete optimization. - Algebra: Polynomial computations, group theory calculations, matrix decompositions.
AI Lab
100% remote

Physicist with Python Experience (m/w/d)

For an AI lab we are looking for phycists with python experience to train an AI model (Large Language Model - LLM). GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as a phycist, you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Design original computational physic problems that simulate real research workflows. - Create problems requiring Python programming to solve (using numpy, scipy, sympy). - Ensure problems are computationally intensive and cannot be solved manually within reasonable timeframes (days/weeks). - Develop problems requiring non-trivial reasoning chains. - Base problems on real research challenges or practical applications from physical practice. - Verify solutions using Python with standard libraries. - Document problem statements clearly and provide verified correct answers.
AI Lab
100% remote

Chemist with Python Experience (m/w/d)

GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as an AI Tutor in Chemistry, you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Generate prompts that challenge AI. - Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. - Correct the model’s responses based on your domain-specific knowledge.
AI Lab
100% remote

Biologist with Python Experience (m/w/d)

GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as an AI Tutor in Biology, you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Generate prompts that challenge AI. - Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. - Correct the model’s responses based on your domain-specific knowledge.
AI Lab
100% remote

Sales Manager for a Media Company (m/f/d)

- Independent marketing of our brand portfolio through innovative licensing, brand and lifestyle collaborations to create unique brand experiences - Responsibility for defined industries within the sales team, including strategic development, identification and outreach of target customers as well as ongoing market and trend analyses - Building new partnerships by acquiring new licensing partners and supporting and developing existing licensees - Planning, managing and controlling the budget for the industries you’re responsible for with a strategic focus - Participation in relevant industry fairs to pick up trends and acquire potential partners - Adapting marketing materials and proposals to tailor them to individual customer needs - Acting as an interface & first point of contact for external sales partners, including assessing and coordinating sales potential - Promoting effective collaboration between sales and brand management - Maintaining and utilizing the CRM system as well as potential overviews
Media Company
Hamburg, Germany
100% remote
New

Senior Regulatory Compliance Expert (FDA Inspection Preparation) (m/f/d)

A company is looking for a Senior Regulatory Compliance Expert to support its team in getting ready for FDA inspections. The role includes conducting mock inspections, providing strategic advice on inspection readiness, and assisting with pre-approval and routine inspections. The ideal candidate has extensive expertise in compliance with legal requirements, especially FDA standards, and plays a key role in ensuring the company meets global compliance demands. - Conduct mock inspections according to FDA standards - Provide strategic advice on inspection readiness - Support pre-approval and routine inspections
Pharma
Munich, Germany
100% remote
New

Commissioning & Qualification (C&Q) Engineer (m/f/d)

A company is looking for an experienced Commissioning & Qualification (C&Q) Engineer to qualify and commission production equipment according to GMP standards. The goal of the project is to ensure the technical and organizational prerequisites for GMP-compliant qualification of the production equipment. - Independent execution of commissioning and qualification activities, especially in IOQ - Operation of PCS7 systems - Working with single-use equipment - Carrying out commissioning and qualification activities for production equipment - Ensuring all technical and organizational prerequisites for C&Q - GMP-compliant qualification of the related production equipment
Pharma
Munich, Germany
100% remote
New

Quality Compliance Auditor (GCP/GCLP/GVP) (M/W/D)

A company is looking for an experienced Quality Compliance Auditor who will be responsible for ensuring compliance with GCP, GCLP and GVP standards. The project aims to conduct internal and external audits, prepare and support regulatory inspections, and identify compliance gaps and derive corrective actions. The role includes planning and conducting audits, assisting with regulatory inspections and ensuring compliance with ICH guidelines as well as EMA/FDA regulations. - Conducting internal and external audits (GCP, GCLP, GVP) - Preparing and supporting regulatory inspections (e.g. MHRA, FDA, EMA) - Identifying compliance gaps and deriving corrective actions
Pharma
Germany
100% remote

Frontend developer to HR platform with Angular experience

Reach out to us if you are interested in working with us on the project.
FRATCH
Munich
90% remote
Sign up to get access to more exciting projects that match your skills and preferences!

AI Agent Evaluation Analyst (m/w/d)

New
Sign up to view the number of applicants
Industry
Information Technology (IT)
Areas
Quality Assurance (QA)
Research and Development (R&D)

Project info

  • Daily rate
    from 280€
  • Language
    • English
      (Advanced)
  • Remote
    100%

Description

We’re on the hunt for QAs for autonomous AI agents for a new project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout the project, you’ll have to balance quality assurance, research, and logical problem-solving. This project opportunity is ideal for people who enjoy looking at systems holistically and thinking through scenarios, implications, and edge cases.

You do not need a coding background, but you must be curious, intellectually rigorous, and capable of evaluating the soundness and consistency of complex setups. If you’ve ever excelled in things like consulting, CHGK, Olympiads, case solving, or systems thinking — you might be a great fit.

What you’ll be doing:

  • Reviewing evaluation tasks and scenarios for logic, completeness, and realism.
  • Identifying inconsistencies, missing assumptions, or unclear decision points.
  • Helping define clear expected behaviors (gold standards) for AI agents.
  • Annotating cause-effect relationships, reasoning paths, and plausible alternatives.
  • Thinking through complex systems and policies as a human would to ensure agents are tested properly.
  • Working closely with QA, writers, or developers to suggest refinements or edge case coverage.

Requirements

  • Excellent analytical thinking: Can reason about complex systems, scenarios, and logical implications.
  • Strong attention to detail: Can spot contradictions, ambiguities, and vague requirements.
  • Familiarity with structured data formats: Can read, not necessarily write JSON/YAML.
  • Can assess scenarios holistically: What's missing, what’s unrealistic, what might break?
  • Good communication and clear writing (in English) to document your findings.

We also value applicants who have:

  • Experience with policy evaluation, logic puzzles, case studies, or structured scenario design.
  • Background in consulting, academia, olympiads (e.g. logic/math/informatics), or research.
  • Exposure to LLMs, prompt engineering, or AI-generated content.
  • Familiarity with QA or test-case thinking (edge cases, failure modes, “what could go wrong”). Some understanding of how scoring or evaluation works in agent testing (precision, coverage, etc.).