Prototyped & validated concept: Independently developed a fully self-coded PoC for an innovative marketing asset generator using OpenAI’s GPT Image 1, and set up Langfuse for evaluation. Led implementation & launch: Directed the engineering team in building the production version and collaborated with the product head to define and execute the go-to-market strategy.
Leveraged reasoning LLMs to design, optimize, and validate advanced prompt engineering strategies for complex use cases. Implemented Langfuse to establish robust prompt management and integrated AI evaluation pipelines for continuous performance monitoring.
Self-organized and results-driven AI Evaluation Engineer & Project Manager with 11+ years of experience leading software development teams and 3+ years specializing in generative AI solutions. Proven expertise in LLM evaluation using LangSmith, Langfuse, and Weights & Biases, alongside strong capabilities in cloud platforms (AWS, GCP, Azure) and Agile project management. Skilled at shipping innovative AI applications and building robust evaluation pipelines to ensure quality, reliability, and measurable impact. Seeking a challenging remote role to deliver highperforming AI products and drive meaningful business results through technical depth and collaborative leadership.
Discover other experts with similar qualifications and experience
2025 © FRATCH.IO GmbH. All rights reserved.