DANNY R.

AI Evaluation Consultant

Anahuac, United States

Experience

Jan 2024 - Present
1 year 11 months

AI Evaluation Consultant

Freelance

  • Audited complex AI agent task graphs for logical completeness, policy consistency, and failure modes.
  • Developed structured gold-standard responses and banded rubrics to evaluate reasoning quality and compliance.
  • Annotated JSON-formatted task objects, defining state transitions, triggers, and invalid paths.
  • Collaborated with QA teams to expand edge-case coverage, improving test precision and inter-rater agreement (κ ≈ 0.8).
  • Created evaluation templates that reduced analysis time per task by 25%.
Jan 2021 - Dec 2024
4 years

Policy & Systems Analyst

Independent Projects

  • Researched autonomous decision-making frameworks and drafted validation cases for risk assessment and scenario testing.
  • Identified contradictions and ambiguous rules in multi-step processes using structured logic trees.
  • Produced concise findings reports to guide policy adjustments and system improvements.
Jan 2019 - Dec 2021
3 years

Research Assistant

Decision Modeling Lab

  • Supported studies on algorithmic reasoning and human error detection in complex decision systems.
  • Designed evaluation scenarios to benchmark model performance and document failure patterns.
  • Presented findings through structured analytical briefs and QA reports.

Summary

Methodical and intellectually curious AI Evaluation Analyst with expertise in reasoning analysis, scenario design, and quality assurance of autonomous AI systems. Skilled at identifying inconsistencies, underspecified logic, and unrealistic task flows in agent testing environments. Experienced in drafting clear evaluation rubrics, documenting gold-standard behaviors, and articulating cause–effect reasoning paths.

Languages

Japanese
Native
English
Advanced

Education

Western Governors University

Master of Science · Data Analytics & Logic Systems · United States

University of Minnesota

Bachelor of Science · Cognitive Science and Information Systems · United States

Need a freelancer? Find your match in seconds.
Try FRATCH GPT
More actions