Project details

Recommended projects

Evaluation Scenario Writer (m/w/d)

We’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically: - Designing structured test scenarios based on real-world tasks. - Defining the golden path and acceptable agent behavior. - Annotating task steps, expected outputs, and edge cases. - Working with devs to test your scenarios and improve clarity. - Reviewing agent outputs and adapting tests accordingly
100% remote

AI Evaluation Consultant (m/w/d)

We are seeking an analytical and technically-minded professional to: - Evaluate AI outputs and processes - Ensure quality, accuracy, and reliability - Identify logical errors, risks, and structural inconsistencies - Provide actionable insights and recommendations to the team Ideal candidates: - Consultants, auditors, analysts, data researchers, or business/technical analysts with strong reasoning skills - Professionals curious about AI, process improvement, and quality evaluation - Problem-solvers who enjoy analyzing complex systems, logic, and scenarios Key Responsibilities: - Lead evaluation of AI outputs and related processes - Review tasks against expected/ideal scenarios; identify gaps and risks - Provide structured, actionable recommendations to engineers, domain experts, and managers - Maintain and improve evaluation guidelines, checklists, SOPs - Suggest new approaches, tools, and processes to enhance AI evaluation
AI Labs
100% remote

Freelance Chemistry Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance chemistry experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in chemistry contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for chemistry applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote

Freelance Biology Expert for AI Model Training (m/f/d)

An AI lab is looking for a freelance biology experts to evaluate AI models. The goal of the project is to assess the performance, accuracy, and reliability of AI models applied in biology (all areas) contexts. The role involves working closely with the development team to ensure the models meet industry standards and provide actionable insights. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Evaluate AI models for biology applications. - Analyze model outputs and provide feedback for improvement. - Collaborate with the development team to ensure alignment with industry standards. - Document findings and recommendations for model optimization. - Conduct tests to validate model performance and reliability.
AI Lab
100% remote
New

Freelance Product Owner Android Development (m/f/d)

A company is looking for an experienced Product Owner to develop an Android app within a large organization. The focus is on leading and coordinating the development process, and a solid understanding of hardware is a plus. Main responsibilities: - Lead the development of an Android app in a corporate environment - Work closely with development teams and stakeholders - Ensure project goals and deadlines are met - Translate business requirements into technical specifications - Prioritize and maintain the product backlog
Tech conglomerate
Berlin, Germany
100% remote

Freelance Automotive Engineer (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Physics Expert (with Python) - Quality Assurance / AI Trainer

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Senior Project Manager Customer Interaction

A company is seeking support within the project for evaluating, implementing, and further developing quality surveys in digital channels. The goal of the project is to increase customer satisfaction in digital channels, evaluate, implement, and further develop survey methods to enable consistent measurement of customer satisfaction across all channels. Improvement potentials should be identified and implemented. The role includes consulting, developing, and executing measures for collecting and improving customer satisfaction in digital channels. Main tasks: - Advising on survey methods for capturing customer experience and quality in digital channels, market standards, benchmarks, and future orientation. - Developing a future model for quality in digital channels, relevant KPIs and survey methods, as well as standard processes. - Implementing the decided measures, including interface management and coordination with technology partners and social partners. - Testing implemented measures for data collection and ensuring they meet the required criteria. - Consolidating and listing existing and missing customer survey methods/Quality KPIs in all responsible digital channels. - Advising on the preparation of decision templates and implementing the necessary measures. - Identifying improvement potentials and developing a standard process for transparency and implementation.
Telecommunication
Munich, Germany
100% remote

Project Manager Magazines / Magazine Production (m/f/d)

- Responsibility for coordinating and managing the entire production process of magazine publications - Planning and oversight of issue structure, schedules, adverts and workflows - Close collaboration with editorial, publishing management, marketing, technical, sales, printing plant and service providers - Quality assurance for layouts, texts and print approvals - Cost calculation and organization of supplementary products (e.g., inserts, posters, extensions) - Active role in strategic projects, conferences and the launch of new formats
Media Company
Munich, Germany
50% remote

Freelance Ruby Developer (m/f/d)

For an AI lab we are looking for Ruby Developer to train an AI model (Large Language Model - LLM). You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities. - Code generation and code review - Prompt evaluation and complex data annotation - Training and evaluation of large language models - Benchmarking and agent-based code execution in sandboxed environments - Working across multiple programming languages (Python, JavaScript/TypeScript, Rust, SQL, etc.) - Adapting guidelines for new domains and use cases - Collaborating with project leads, solution engineers, and supply managers on complex or experimental projects
AI Lab
100% remote

ERP Transformation Manager (m/f/d)

A company is looking for an experienced ERP Transformation Manager who will take overall responsibility for planning and steering a comprehensive ERP transformation program. The project aims to harmonize processes, introduce a new ERP system, and implement IFRS requirements. The ERP Transformation Manager will analyze, redesign, and standardize the commercial core processes in civil engineering and track construction. This includes translating IFRS requirements into system structures and posting logic, closely coordinated with the Finance, Controlling, Project Management, and IT departments. The role covers steering the ERP implementation, including fit-gap analyses, process design, test management, and migration. In addition, a unified reporting and KPI framework for group reporting and project control will be established. The manager will act as the central interface between operational units, finance, executive management, and the group, and will set up a sustainable change and training concept for users. - Planning and steering the ERP transformation program (IFRS transition, process harmonization, ERP rollout) - Analyzing, redesigning, and standardizing commercial core processes - Translating IFRS requirements into system structures and posting logic - Steering the ERP implementation including fit-gap analyses, process design, test management, and migration - Building a unified reporting and KPI framework - Stakeholder management and ensuring smooth communication - Leading interdisciplinary project teams and managing external consultants and implementation partners - Establishing a sustainable change and training concept - Ensuring measurable process improvements after the ERP system go-live
Infrastructure Construction
Eisenach, Germany
70% remote

Freelance Product Owner for Point Of Sale App

- Vision & Strategy: You shape the roadmap for the POS system of the future and ensure we not only meet but anticipate our customers' needs. - Multi-Platform Excellence: You manage the development of our app ecosystem with a focus on Android. You understand the challenges of different form factors (from the compact card terminal to the large stationary tablet). - Delivery Management: You lead your cross-functional development team with clarity and enthusiasm. You facilitate ceremonies, remove blockers, and celebrate releases. - Requirements Engineering: You translate complex business logic into precise technical requirements and user stories that leave no questions unanswered for developers.
POS Scale-Up
Berlin, Germany

Freelance Mechanical Engineer with Python Experience (m/w/d)

For an AI lab we are looking for Mechanical Engineer with python experience to train an AI model (Large Language Model - LLM). GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Mechanical Engineering, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Electrical Engineer with Python Experience (m/w/d)

Generative AI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. Although every project is unique, you might typically: - Content Creation & Refinement: Create and refine content to ensure accuracy and relevance across a variety of topics in Physics, while also developing references and examples of tasks. - Experts Acquisition: Assess the qualification tests of experts, ensuring their competency. - Chat Moderation: Provide support by addressing project-related questions from other experts in Discord chats, especially those related to project guidelines. - Auditing Work: Review and evaluate tasks completed by other experts, ensuring they align with project guidelines. Provide constructive feedback, verify expertise-related information, and edit content as necessary to improve quality.
AI Studio
100% remote

Freelance Cybersecurity Consultant for AI Red Teaming

For an AI lab we are looking for cybersecurity consultants to train an AI model (Large Language Model - LLM). You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities - Evaluate and red team AI models and agents and machine learning systems for vulnerabilities and safety risks. - Create offline reproducible & auto-evaluable test cases to test safety & capability of AI agents. - Develop and implement automation scripts, custom tools, environments and test harnesses. - Lead or contribute to security research initiatives, especially in AI safety, creating and implementing realistic and challenging attack scenarios for the model. - Advise on cybersecurity best practices and policy implications.
AI Lab
100% remote

Developer for Consent Management Implementation (m/f/d)

To replace the consent layers on the web that were previously provided by third-party CMPs for our international brands, these layers need to be reimplemented so they can be maintained and deployed in-house. This requires solid knowledge of TypeScript, Vue.js and classic web technologies (HTML and CSS). The goal is to deliver executable code that implements all requirements and includes automated tests that verify correct functionality. What exactly is the scope of the assignment: The main focus is on developing elements for decision templates on the approach and on implementing measures throughout the project as designed. This specifically includes the following service packages: - Implement code - Implement executable tests that must pass on delivery, test coverage >= 80% - Create documentation for the code - Create brand-specific cmp-config files - Create a project (including asset management requirements) as a copy of the Consent Management Platform - Remove netID references - Create brand-specific settings and files for custom purposes/vendors - Add new brand-specific CSS themes (variable values, logos, etc.) - Include the required official IAB GVL translations (ES, FR) in the weekly sync with GVL - Implement I18n and prepare brand-specific data sources - Implement PMC2.0 backend usage modules - Implement the playout logic - Implement the layer initialization process (mode=default and mode=resurface) - CDN upload and release process - Project documentation Project execution: - The deliverable should be written in TypeScript and Vue.js, built with Vite, tested with Vitest.
Telecommunication
Karlsruhe, Germany
100% remote

Freelance Java Developer (m/f/d)

For an AI lab we are looking for Java Developer to train an AI model (Large Language Model - LLM). You help AI to make sense of the world. As consultant, you may be invited to take part in online projects to train the model in your domain of expertise. This flexible role accommodates both experts seeking part-time engagement (minimum few hours/week) and those interested in full-time opportunities. - Code generation and code review - Prompt evaluation and complex data annotation - Training and evaluation of large language models - Benchmarking and agent-based code execution in sandboxed environments - Working across multiple programming languages - Adapting guidelines for new domains and use cases - Following project-specific rubrics and requirements - Collaborating with project leads, solution engineers, and supply managers on complex or experimental projects
AI Lab
100% remote

Commissioning & Qualification (C&Q) Engineer (m/f/d)

A company is looking for an experienced Commissioning & Qualification (C&Q) Engineer to qualify and commission production facilities in line with GMP standards. The project goal is to ensure the technical and organizational requirements for the GMP-compliant qualification of the production facilities. - Independent execution of commissioning and qualification activities, especially in the area of IOQ - Operation of PCS7 systems - Working with single-use equipment - Execution of commissioning and qualification activities for production facilities - Ensuring all technical and organizational prerequisites for C&Q - GMP-compliant qualification of the associated production facilities
Pharma
Munich, Germany
100% remote

Freelance Editor (m/f/d)

- You create topic briefings, research and write well-founded (guide) texts in a sophisticated style and edit the contributions of our freelance authors - The topics cover amateur gardeners in the garden and plant sector, as well as living and interior design, design and decor, do-it-yourself, and cooking and nutrition are also part of the spectrum - In close exchange with colleagues, readers and experts, you develop exciting topics and tailor them to the target group - Maintaining and expanding press contacts, as well as ordering photo material for garden, living and decor, are also part of your tasks - Optionally, you organize and carry out photo shoots and attend press appointments and trade fairs
Media Company
Munich, Germany
50% remote

Senior Faktor 10 Developer (IPS / IPM) (m/f/d)

An insurance company in Nuremberg is looking for a Senior Faktor 10 developer with expertise in IPS and IPM. The project involves developing and optimizing software solutions in the insurance domain, focusing on high performance and reliability. The role requires solid knowledge of Faktor 10 and its applications in the insurance sector. Main tasks: - Develop and optimize applications with Faktor 10, especially in IPS and IPM. - Collaborate with cross-functional teams to ensure seamless integration and functionality. - Analyze and resolve complex technical issues. - Provide technical guidance and mentoring to junior developers. - Ensure compliance with industry standards and best practices.
Insurance Company
Nuremberg, Germany
100% remote

AI Consultants - Data Science (m/w/d)

We are seeking experienced data scientists to create computationally intensive data science problems for an advanced AI evaluation project. This is a remote, project-based opportunity for experts who can design challenging problems that require computational methods to solve and mirror the full data science lifecycle - from data acquisition and processing to statistical analysis and actionable business insights. What You'll Do - Design original computational data science problems that simulate real-world analytical workflows across industries (telecom, finance, government, e-commerce, healthcare) - Create problems requiring Python programming to solve (using pandas, numpy, scipy, sklearn, statsmodels, matplotlib, seaborn) - Ensure problems are computationally intensive and cannot be solved manually within reasonable timeframes (days/weeks) - Develop problems requiring non-trivial reasoning chains in data processing, statistical analysis, feature engineering, predictive modeling, and insight extraction - Create deterministic problems with reproducible answers - avoid stochastic elements or require fixed random seeds for exact reproducibility - Base problems on real business challenges: customer analytics, risk assessment, fraud detection, forecasting, optimization, and operational efficiency - Design end-to-end problems spanning the complete data science pipeline (data ingestion → cleaning → EDA → modeling → validation → deployment considerations) - Incorporate big data processing scenarios requiring scalable computational approaches - Verify solutions using Python with standard data science libraries and statistical methods - Document problem statements clearly with realistic business contexts and provide verified correct answers
AI Lab
Munich, Germany
100% remote

IT Project Manager ServiceNow (Senior)

- A company in the energy and energy services sector is looking for an experienced IT project manager for a ServiceNow project. - The goal of the project is to lead and successfully implement an enterprise ServiceNow project with a focus on ITSM and Customer Service Management (CSM). - The role includes planning, controlling, and ensuring a stable project process in close collaboration with internal and external stakeholders. - Operational & strategic service management of the ServiceNow platform - Process ownership for ITSM and CSM (B2B & B2C) - Process design, governance & continuous improvement - Management of external providers and vendors - Monitoring, KPI analysis & deriving improvements - Ensuring stable platform operations
Energy
Germany
100% remote

IT Project Manager ISO 27.001 - Gap Closure (m/f/d)

A company in the automotive supplier industry is looking for support in the field of cyber security. The goal of the project is to close gaps as part of the ISO 27001 certification. The IT project manager will play a central role in managing and monitoring the gap closure measures. - Managing and monitoring gap closure measures. - Consistently tracking tasks, deadlines, and responsibilities. - Coordinating between IT, business units, information security, and, if necessary, external service providers. - Ensuring that measures are implemented in an ISO-27001-compliant, verifiable, and documented manner. - Providing transparent status reports to program management and stakeholders. - Supporting audit preparation (evidence, action status, maturity level).
Munich, Germany
20% remote
New

Freelance Post-Merger Integration Consultant with a Strong Tech and Commercial Focus (m/f/d)

An organization is seeking an experienced post-merger integration consultant with a strong focus on technology and commercial aspects. The goal of the project is to ensure a successful integration after a merger or acquisition by addressing technical and business challenges. The role requires someone with proven leadership experience at larger technology companies, capable of taking on strategic and operational tasks. Key Responsibilities: - Leading and coordinating post-merger integration processes - Analyzing and optimizing technical and commercial structures - Developing and implementing integration strategies - Collaborating with internal and external stakeholders - Ensuring adherence to timelines and budgets
Tech
Berlin, Germany
60% remote

Chemist with Python Experience (m/f/d)

GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as an AI Tutor in Chemistry, you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Generate prompts that challenge AI. - Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. - Correct the model’s responses based on your domain-specific knowledge.
AI Lab
100% remote

Senior Web Developer (m/f/d)

- You develop modern, high-performing web frontends with React, TypeScript, HTML and CSS - You implement responsive designs with a focus on accessibility and performance - You plan and execute unit and integration tests (for example with Playwright) - Troubleshoot in development, test, or live environments
media company
Munich, Germany
100% remote

Sales Manager for a Media Company (m/f/d)

- Independently marketing our brand portfolio through innovative licensing, branding, and lifestyle collaborations to create unique brand experiences - Responsible for defined industries in the sales team, including strategic development, identifying and approaching target customers, and ongoing market and trend analysis - Building new partnerships by acquiring new licensing partners and managing and developing existing licensees - Planning, managing, and controlling the budget for the responsible industries with a strategic focus - Participating in relevant industry trade shows to catch trends and attract potential partners - Adjusting marketing materials and proposals to tailor them to individual customer needs - Acting as a liaison & first point of contact for external sales partners, including evaluating and aligning sales potentials - Promoting effective collaboration between sales and brand management - Maintaining and using the CRM system and opportunity overviews
Media Company
Hamburg, Germany
100% remote

Biologist with Python Experience (m/f/d)

GenAI models are improving fast, and one of our goals is to make them able to handle specialized questions and develop complex reasoning skills. If you join the platform as an AI Tutor in Biology, you’ll have the chance to collaborate on these projects. Although every project is unique, you might typically: - Generate prompts that challenge the AI. - Define clear scoring criteria to judge the accuracy of the AI’s answers. - Correct the model’s responses based on your domain knowledge.
AI Lab
100% remote

AI Consultant - Machine Learning (m/w/d)

For an AI lab we are looking for Machine learning experts to train an AI model (Large Language Model - LLM). GenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join you’ll have the opportunity to collaborate on these projects. Although every project is unique, you might typically: - Design original computational STEM problems that simulate real scientific workflows - Create problems that require Python programming to solve - Ensure problems are computationally intensive and cannot be solved manually within reasonable timeframes (days/weeks) - Develop problems requiring non-trivial reasoning chains and creative problem-solving approaches - Verify solutions using Python with standard libraries (numpy, pandas, scipy, sklearn) - Document problem statements clearly and provide verified correct answers
AI Lab
100% remote

AI Consultant for Vibe Coding (m/w/d)

An AI Lab is looking for a AI Trainer for Vibe Coding. This role involves producing accurate, well-reasoned outputs across diverse domains, leveraging automation and AI tools. The position requires expertise in coding and optimizing Python scripts, handling large datasets, improving AI-generated content, and formatting and troubleshooting technical workflows. This is a remote part-time role that can be flexibly tailored to your availability – from just a few hours per week to full-time. Key responsibilities: - Conduct advanced web research and data mining using multiple tools to locate and extract information from official sources. Use LLMs and advanced prompts to refine search strategies and validate data accuracy by cross-referencing authoritative sources. - Perform web scraping and data extraction by navigating complex website structures and multi-level pages (regions → companies → detailed pages). Handle dynamic content, archived pages, and various HTML formats, and organize extracted data into clean, well-formatted CSV files. - Write and optimize Python scripts for data processing and analysis using libraries such as pandas, BeautifulSoup, Selenium, and matplotlib. Transform raw data into structured formats (CSV, JSON, tables) and create visualizations when required. - Carry out data processing and quality assurance by cleaning, validating, and structuring datasets. - - Ensure data integrity across multiple sources, apply formatting specifications, and run verification steps to maintain high output quality. - Apply strong problem-solving and task execution skills to break down complex workflows, troubleshoot technical issues independently, and adapt quickly between different domains and task types with minimal supervision. - Produce clear documentation and high-quality outputs that follow exact requirements for file formats, naming conventions, and data structure. Maintain reproducible workflows and well-organized code.
AI Lab
100% remote

Frontend developer to HR platform with Angular experience

Reach out to us if you are interested in working with us on the project.
FRATCH
Munich
90% remote
Sign up to get access to more exciting projects that match your skills and preferences!

Time's up! We are no longer accepting applications.

AI Agent Evaluation Analyst (m/w/d)

Industry
Information Technology (IT)
Areas
Quality Assurance (QA)
Research and Development (R&D)

Project info

  • Daily rate
    from 280€
  • Language
    • English
      (Advanced)
  • Remote
    100%

Description

We’re on the hunt for QAs for autonomous AI agents for a new project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout the project, you’ll have to balance quality assurance, research, and logical problem-solving. This project opportunity is ideal for people who enjoy looking at systems holistically and thinking through scenarios, implications, and edge cases.

You do not need a coding background, but you must be curious, intellectually rigorous, and capable of evaluating the soundness and consistency of complex setups. If you’ve ever excelled in things like consulting, CHGK, Olympiads, case solving, or systems thinking — you might be a great fit.

What you’ll be doing:

  • Reviewing evaluation tasks and scenarios for logic, completeness, and realism.
  • Identifying inconsistencies, missing assumptions, or unclear decision points.
  • Helping define clear expected behaviors (gold standards) for AI agents.
  • Annotating cause-effect relationships, reasoning paths, and plausible alternatives.
  • Thinking through complex systems and policies as a human would to ensure agents are tested properly.
  • Working closely with QA, writers, or developers to suggest refinements or edge case coverage.

Requirements

  • Excellent analytical thinking: Can reason about complex systems, scenarios, and logical implications.
  • Strong attention to detail: Can spot contradictions, ambiguities, and vague requirements.
  • Familiarity with structured data formats: Can read, not necessarily write JSON/YAML.
  • Can assess scenarios holistically: What's missing, what’s unrealistic, what might break?
  • Good communication and clear writing (in English) to document your findings.

We also value applicants who have:

  • Experience with policy evaluation, logic puzzles, case studies, or structured scenario design.
  • Background in consulting, academia, olympiads (e.g. logic/math/informatics), or research.
  • Exposure to LLMs, prompt engineering, or AI-generated content.
  • Familiarity with QA or test-case thinking (edge cases, failure modes, “what could go wrong”). Some understanding of how scoring or evaluation works in agent testing (precision, coverage, etc.).