Project details

Recommended projects

New

AI Agent Evaluation Analyst (m/w/d)

We’re on the hunt for QAs for autonomous AI agents for a new project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout the project, you’ll have to balance quality assurance, research, and logical problem-solving. This project opportunity is ideal for people who enjoy looking at systems holistically and thinking through scenarios, implications, and edge cases. You do not need a coding background, but you must be curious, intellectually rigorous, and capable of evaluating the soundness and consistency of complex setups. If you’ve ever excelled in things like consulting, CHGK, Olympiads, case solving, or systems thinking — you might be a great fit. What you’ll be doing: - Reviewing evaluation tasks and scenarios for logic, completeness, and realism. - Identifying inconsistencies, missing assumptions, or unclear decision points. - Helping define clear expected behaviors (gold standards) for AI agents. - Annotating cause-effect relationships, reasoning paths, and plausible alternatives. - Thinking through complex systems and policies as a human would to ensure agents are tested properly. - Working closely with QA, writers, or developers to suggest refinements or edge case coverage.
100% remote
New

Evaluation Scenario Writer (m/w/d)

We’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically: - Designing structured test scenarios based on real-world tasks. - Defining the golden path and acceptable agent behavior. - Annotating task steps, expected outputs, and edge cases. - Working with devs to test your scenarios and improve clarity. - Reviewing agent outputs and adapting tests accordingly
100% remote

AI Evaluation Consultant (m/w/d)

We are seeking an analytical and technically-minded professional to: - Evaluate AI outputs and processes - Ensure quality, accuracy, and reliability - Identify logical errors, risks, and structural inconsistencies - Provide actionable insights and recommendations to the team Ideal candidates: - Consultants, auditors, analysts, data researchers, or business/technical analysts with strong reasoning skills - Professionals curious about AI, process improvement, and quality evaluation - Problem-solvers who enjoy analyzing complex systems, logic, and scenarios Key Responsibilities: - Lead evaluation of AI outputs and related processes - Review tasks against expected/ideal scenarios; identify gaps and risks - Provide structured, actionable recommendations to engineers, domain experts, and managers - Maintain and improve evaluation guidelines, checklists, SOPs - Suggest new approaches, tools, and processes to enhance AI evaluation
AI Labs
100% remote

Freelance Chemistry Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance chemistry experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in chemistry contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for chemistry applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote

Freelance Biology Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance biology experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in biology (all areas) contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for biology applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote

Freelance Civil Engineer with Python Experience (m/f/d)

A company is looking for a freelance Civil engineering experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in civil engineering contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. Key responsibilities: - Evaluate AI models for civil engineering applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote

Freelance Automotive Engineer (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote
New

Requirement and Content Manager (m/f/d)

A company is looking for support for a project focused on optimizing the buy and leave journey to improve customer acquisition and retention. The goal is to increase efficiency in the value chain, speed up time to market, and reduce the total cost of ownership (TCO). The Adobe Experience Manager (AEM) platform plays a central role, especially for implementing new features like compositions, templates, and micro-frontends. Main tasks: - - Definition and design of requirements in the Adobe Experience Manager CMS area, including setting the development order of compositions, components, templates, and micro-frontends. - Supporting documentation and feedback loops with tools like Jira and Confluence. - Project-related consulting of development teams and requirement owners during the development phase. - Analysis of the existing Adobe Experience Manager CMS infrastructure and deriving recommendations to optimize content and site structure, AEM interfaces, and performance. - Creating documentation on using the provided compositions & components and sharing this information with internal teams. - Professional consulting of business and technology departments as well as external partners as part of the change program. - Advising on technical requirements, including content structure, site structure, micro-frontends, product data modeling, compositions & components, templates, headless CMS, and AEM interfaces. - Support on special topics like accessibility, CIAM, multilingual, personalization, and campaigning.
Telecommunication
Munich, Germany
100% remote

Freelance Physics Expert (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote
New

Management Consulting (Senior Level)

A company is seeking support for the "SOx way forward" project. The goal of the project is implementing and expanding IT General Controls (ITGC), Access Management, and Super User Monitoring in various IT systems. The task includes gathering, detailing, and managing the implementation of the necessary requirements. - Planning, coordinating, and gathering SOx-relevant requirements - Gathering, detailing, and documenting requirements and implementing them in the IT and network infrastructure - Independently managing the analysis and implementation of requirements - Reviewing the developed design specifications based on the requirements - Preparing, planning, and advising during the test phase
Telecommunication
Munich, Germany
100% remote

Business Analyst – SAP S/4HANA Output Management (f/m/d)

- A company is looking for an experienced business analyst to support the transformation from SAP ECC to S/4HANA Utilities. - The project aims to analyze, document, and optimize output and archiving processes, as well as create functional designs and specifications. - The analyst will work closely with product owners, IT, and business units to align on feasibility, effort, and prioritization of requirements.
Energy
Munich, Germany
100% remote

Senior Project Manager Customer Interaction

A company is seeking support for the project to evaluate, implement and further develop quality surveys in digital channels. The goal of the project is to increase customer satisfaction in digital channels, evaluate, implement and enhance survey methods to enable consistent collection of customer satisfaction across all channels. Improvement potentials should be identified and implemented. The role includes consulting, developing and implementing measures to collect and improve customer satisfaction in digital channels. Main tasks: - Consulting on survey methods for gathering customer experience and quality in digital channels, market standards, benchmarks and future orientation. - Developing a future model for quality in digital channels, relevant KPIs and survey methods as well as standard processes. - Implementing the decided measures including interface management and coordination with technology partners and social partners. - Testing implemented measures to collect data and ensure all required criteria are met. - Consolidating and listing existing and missing customer survey methods/quality KPIs across all responsible digital channels. - Advising on the preparation of decision templates and implementing the necessary measures. - Identifying improvement potentials and developing a standard process for transparency and implementation.
Telecommunication
Munich, Germany
100% remote

Freelance Statistics Expert with Python Experience (m/f/d)

For an AI lab we are looking for Statistics Expert with python experience to train an AI model (Large Language Model - LLM). GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Generate prompts that challenge AI. - Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. - Correct the model’s responses based on your domain-specific knowledge.
AI Lab
100% remote

Project Manager Magazines / Magazine Production (m/f/d)

- Responsibility for coordinating and managing the entire production process of magazine publications - Planning and monitoring issue structure, deadlines, advertisements, and workflows - Close collaboration with editorial, publishing management, marketing, IT, sales, printers, and service providers - Quality assurance of layouts, copy, and print approvals - Cost calculation and organization of supplementary products (e.g., inserts, posters, expansions) - Active role in strategic projects, conferences, and the launch of new formats
Media Company
Munich, Germany
50% remote

Freelance Ruby Developer (m/f/d)

For an AI lab we are looking for a Ruby Developer to train an AI model (Large Language Model - LLM). You help AI make sense of the world. As a consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (at least a few hours per week) and those interested in full-time opportunities. - Code generation and code review - Prompt evaluation and complex data annotation - Training and evaluation of large language models - Benchmarking and agent-based code execution in sandboxed environments - Working across multiple programming languages (Python, JavaScript/TypeScript, Rust, SQL, etc.) - Adapting guidelines for new domains and use cases - Collaborating with project leads, solution engineers, and supply managers on complex or experimental projects
AI Lab
100% remote

Freelance Mechanical Engineer with Python Experience (m/f/d)

For an AI lab we are looking for Mechanical Engineer with python experience to train an AI model (Large Language Model - LLM). GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Mechanical Engineering, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Product Manager POS / Checkout Systems (m/f/d)

A company is looking for an experienced product manager, ideally with a background in the hospitality sector. The goal of this project is to develop innovative product solutions and optimize existing processes to enhance the customer experience and strengthen the company’s market position. The product manager will play a key role in defining and executing product strategies and will work closely with internal teams and external stakeholders. Travel expenses will not be covered, so the candidate should preferably be based in Berlin or willing to cover costs themselves. - Lead and oversee the entire project lifecycle in Cloud POS - Develop and implement project plans, schedules, and budgets - Coordinate between different teams and stakeholders - Ensure compliance with project goals and requirements - Identify and manage risks and issues - Report to management and other relevant parties
POS Startup
Berlin, Germany
100% remote

ERP-Transformation Manager (m/w/d)

An established company is looking for an experienced ERP Transformation Manager to take full responsibility for planning and steering a comprehensive ERP transformation program. The project's goal is harmonizing processes, implementing a new ERP system, and meeting IFRS requirements. The ERP Transformation Manager will analyze, redesign, and standardize the commercial core processes in civil and rail construction. This includes translating IFRS requirements into system structures and posting logic, closely coordinating with Finance, Controlling, Project Management, and IT departments. The role includes managing the ERP rollout, including fit-gap analysis, process design, test management, and migration. In addition, a unified reporting and KPI framework for group financial statements and project management will be established. The manager will act as the central interface between operational units, Finance, management, and the group, and will set up a sustainable change and training concept for users. - Planning and steering the ERP transformation program (IFRS transition, process harmonization, ERP rollout) - Analyzing, redesigning, and standardizing commercial core processes - Translating IFRS requirements into system structures and posting logic - Managing the ERP rollout, including fit-gap analysis, process design, test management, and migration - Building a unified reporting and KPI framework - Stakeholder management and ensuring smooth communication - Leading interdisciplinary project teams and managing external consultants and implementation partners - Establishing a sustainable change and training concept - Ensuring measurable process improvements after the ERP system goes live
Infrastrukturbau
Eisenach, Germany
70% remote

Freelance Electrical Engineer with Python Experience (m/w/d)

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Cybersecurity Consultant for AI Red Teaming

For an AI lab we are looking for cybersecurity consultants to train an AI model (Large Language Model - LLM). You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities - Evaluate and red team AI models and agents and machine learning systems for vulnerabilities and safety risks. - Create offline reproducible & auto-evaluable test cases to test safety & capability of AI agents. - Develop and implement automation scripts, custom tools, environments and test harnesses. - Lead or contribute to security research initiatives, especially in AI safety, creating and implementing realistic and challenging attack scenarios for the model. - Advise on cybersecurity best practices and policy implications.
AI Lab
100% remote

Developer for Consent Management Implementation (m/f/d)

For replacing the consent layers previously provided by third-party CMPs on the web for our international brands, these layers need to be reimplemented so they can be operated and served in-house. This requires solid knowledge of TypeScript, Vue.js, and traditional web presentation technologies (HTML and CSS). The goal is to deliver executable code that implements all requirements and includes automated tests that prove correct functionality. What exactly is the scope of the engagement: The focus of the service is on developing elements for decision-making on the approach and on implementing measures along the resulting project path. This specifically includes the following service packages: - Code implementation - Implementation of executable tests that must pass on delivery, test coverage >= 80% - Creation of code documentation - Creation of brand-specific cmp-config files. - Creation of a project (including asset management requirements) as a copy of the consent management platform. - Removal of netID references. - Creation of brand-specific settings and files for custom purposes/providers. - Adding new brand-specific CSS themes (variable values, logos, etc.). - Inclusion of the required official IAB GVL translations (ES, FR) in the weekly synchronization with the GVL. - Implementation of I18n and preparation of brand-specific data sources - Implementation of PMC 2.0 backend usage modules - Implementation of the playout logic - Implementation of the layer initialization process (mode=default and mode=resurface) - CDN upload and release process - Project documentation Project implementation: - The desired result should be written in TypeScript and Vue.js, built with Vite, tested with Vitest.
Telecommunications
Karlsruhe, Germany
100% remote

Freelance Java Developer (all genders)

For an AI lab, we’re looking for a Java Developer to train an AI model (large language model - LLM). You’ll help AI make sense of the world. As a consultant, you may be invited to join online projects to train the model in your area of expertise. This flexible role suits both experts seeking part-time work (minimum a few hours per week) and those interested in full-time opportunities. - Code generation and code review - Prompt evaluation and complex data annotation - Training and evaluation of large language models - Benchmarking and agent-based code execution in sandboxed environments - Working across multiple programming languages - Adapting guidelines for new domains and use cases - Following project-specific rubrics and requirements - Collaborating with project leads, solution engineers, and supply managers on complex or experimental projects
AI Lab
100% remote
New

Commissioning & Qualification (C&Q) Engineer (m/f/d)

A company is looking for an experienced Commissioning & Qualification (C&Q) Engineer to qualify and commission production equipment according to GMP standards. The goal of the project is to ensure the technical and organizational prerequisites for GMP-compliant qualification of the production equipment. - Independent execution of commissioning and qualification activities, especially in IOQ - Operation of PCS7 systems - Working with single-use equipment - Carrying out commissioning and qualification activities for production equipment - Ensuring all technical and organizational prerequisites for C&Q - GMP-compliant qualification of the related production equipment
Pharma
Munich, Germany
100% remote

Freelance Editor (m/f/d)

- You create topic briefs, research and write expert (guide) texts in a sophisticated style and edit the contributions of our freelance authors - The topics cover hobby gardeners in the gardening and plant area, as well as living and furnishing, design and decor, do-it-yourself, and also cooking and nutrition - In close exchange with colleagues, readers and experts, you develop exciting topics and prepare them appropriately for the target audience - Maintaining and expanding press contacts, as well as ordering photo material in the garden, living and decor area, are also part of your tasks - Optionally, you organize and carry out photo shoots and also attend press events and trade fairs
Media Company
Munich, Germany
50% remote

AI Consultants - Data Science (m/w/d)

We are seeking experienced data scientists to create computationally intensive data science problems for an advanced AI evaluation project. This is a remote, project-based opportunity for experts who can design challenging problems that require computational methods to solve and mirror the full data science lifecycle - from data acquisition and processing to statistical analysis and actionable business insights. What You'll Do - Design original computational data science problems that simulate real-world analytical workflows across industries (telecom, finance, government, e-commerce, healthcare) Create problems requiring Python programming to solve (using pandas, numpy, scipy, sklearn, statsmodels, matplotlib, seaborn) - Ensure problems are computationally intensive and cannot be solved manually within reasonable timeframes (days/weeks) - Develop problems requiring non-trivial reasoning chains in data processing, statistical analysis, feature engineering, predictive modeling, and insight extraction - Create deterministic problems with reproducible answers - avoid stochastic elements or require fixed random seeds for exact reproducibility - Base problems on real business challenges: customer analytics, risk assessment, fraud detection, forecasting, optimization, and operational efficiency - Design end-to-end problems spanning the complete data science pipeline (data ingestion → cleaning → EDA → modeling → validation → deployment considerations) - Incorporate big data processing scenarios requiring scalable computational approaches - Verify solutions using Python with standard data science libraries and statistical methods - Document problem statements clearly with realistic business contexts and provide verified correct answers
AI Lab
Munich, Germany
100% remote
New

Senior Factor 10 Developer (IPS / IPM) (m/f/d)

An insurance company in Nuremberg is looking for a Senior Factor 10 Developer with expertise in IPS and IPM. The project includes developing and optimizing software solutions in the insurance sector, focusing on high performance and reliability. The role requires solid knowledge of Factor 10 and its applications in the insurance industry. Key responsibilities: - Developing and optimizing applications with Factor 10, especially in IPS and IPM. - Collaborating with interdisciplinary teams to ensure seamless integration and functionality. - Analyzing and resolving complex technical issues. - Providing technical guidance and mentoring to junior developers. - Ensuring compliance with industry standards and best practices.
Insurance
Nuremberg, Germany
100% remote

IT Project Manager ServiceNow (Senior)

- A company in the energy and energy services sector is looking for an experienced IT project manager for a ServiceNow project. - The goal of the project is to lead and successfully implement an enterprise ServiceNow project focusing on ITSM and Customer Service Management (CSM). - The role includes planning, controlling, and ensuring a stable project flow in close collaboration with internal and external stakeholders. - Operational & strategic service management of the ServiceNow platform - Process ownership for ITSM and CSM (B2B & B2C) - Process design, governance & continuous optimization - Managing external providers and vendors - Monitoring, KPI analysis & deriving improvements - Ensuring stable platform operations
Energy
Germany
100% remote

Chemist with Python Experience (m/w/d)

GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as an AI Tutor in Chemistry, you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Generate prompts that challenge AI. - Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. - Correct the model’s responses based on your domain-specific knowledge.
AI Lab
100% remote

Senior Web Developer (m/f/d)

- You develop modern, high-performance web frontends with React, TypeScript, HTML, and CSS - You implement responsive designs with a focus on accessibility and performance - You plan and execute unit and integration tests (for example with Playwright) - Troubleshooting in development, testing, or live environments
Media company
Munich, Germany
100% remote

Sales Manager for a Media Company (m/f/d)

- Independent marketing of our brand portfolio through innovative licensing, brand and lifestyle collaborations to create unique brand experiences - Responsibility for defined industries within the sales team, including strategic development, identification and outreach of target customers as well as ongoing market and trend analyses - Building new partnerships by acquiring new licensing partners and supporting and developing existing licensees - Planning, managing and controlling the budget for the industries you’re responsible for with a strategic focus - Participation in relevant industry fairs to pick up trends and acquire potential partners - Adapting marketing materials and proposals to tailor them to individual customer needs - Acting as an interface & first point of contact for external sales partners, including assessing and coordinating sales potential - Promoting effective collaboration between sales and brand management - Maintaining and utilizing the CRM system as well as potential overviews
Media Company
Hamburg, Germany
100% remote

Biologist with Python Experience (m/w/d)

GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as an AI Tutor in Biology, you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Generate prompts that challenge AI. - Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. - Correct the model’s responses based on your domain-specific knowledge.
AI Lab
100% remote

Frontend developer to HR platform with Angular experience

Reach out to us if you are interested in working with us on the project.
FRATCH
Munich
90% remote
Sign up to get access to more exciting projects that match your skills and preferences!

AI Agent Evaluation Analyst (m/w/d)

New
Sign up to view the number of applicants
Industry
Information Technology (IT)
Areas
Quality Assurance (QA)
Research and Development (R&D)

Project info

  • Daily rate
    from 280€
  • Language
    • English
      (Advanced)
  • Remote
    100%

Description

We’re on the hunt for QAs for autonomous AI agents for a new project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout the project, you’ll have to balance quality assurance, research, and logical problem-solving. This project opportunity is ideal for people who enjoy looking at systems holistically and thinking through scenarios, implications, and edge cases.

You do not need a coding background, but you must be curious, intellectually rigorous, and capable of evaluating the soundness and consistency of complex setups. If you’ve ever excelled in things like consulting, CHGK, Olympiads, case solving, or systems thinking — you might be a great fit.

What you’ll be doing:

  • Reviewing evaluation tasks and scenarios for logic, completeness, and realism.
  • Identifying inconsistencies, missing assumptions, or unclear decision points.
  • Helping define clear expected behaviors (gold standards) for AI agents.
  • Annotating cause-effect relationships, reasoning paths, and plausible alternatives.
  • Thinking through complex systems and policies as a human would to ensure agents are tested properly.
  • Working closely with QA, writers, or developers to suggest refinements or edge case coverage.

Requirements

  • Excellent analytical thinking: Can reason about complex systems, scenarios, and logical implications.
  • Strong attention to detail: Can spot contradictions, ambiguities, and vague requirements.
  • Familiarity with structured data formats: Can read, not necessarily write JSON/YAML.
  • Can assess scenarios holistically: What's missing, what’s unrealistic, what might break?
  • Good communication and clear writing (in English) to document your findings.

We also value applicants who have:

  • Experience with policy evaluation, logic puzzles, case studies, or structured scenario design.
  • Background in consulting, academia, olympiads (e.g. logic/math/informatics), or research.
  • Exposure to LLMs, prompt engineering, or AI-generated content.
  • Familiarity with QA or test-case thinking (edge cases, failure modes, “what could go wrong”). Some understanding of how scoring or evaluation works in agent testing (precision, coverage, etc.).