Built a video metadata pipeline (detector, tracker, frame extraction, and analysis)
Developed a pre-classifier using OpenAI GPT4.1 Vision and AWS Rekognition
Keyframe analysis with OpenAI GPT4 Vision and Azure Video Indexer
Designed and implemented a RAG architecture with LangChain and pgVector (OpenAI embeddings)
Implemented a long-form video understanding and activity recognition module with LongVU
Built a backend API with FastAPI for RAG-based video search (OpenAI, LangChain, AWS Bedrock)
Created voice-over using Google Text-to-Speech
Set up Terraform and GPU cluster for LongVU with LLM backends (LLaMA 3.2, QWEN)
Added LLM prompt monitoring and testing layer with OpenAI and Helicone
Project: Developed a zero-shot RAG application to automate short-form content for broadcasters (ORF, BR, SWR, Red Bull Studio)
End users: AI-based editorial tool AIDitor in Germany and Austria
Tools and methods: OpenAI, LLaMA, Ollama, Python, FastAPI, Azure Speech Services, Azure Video Indexer, AWS (S3, Lambda, Rekognition, Bedrock, ECS, EC2), Postgres, pgVector, LangChain, PyTorch, YOLO v10, WhisperX, LongVU, Helicone, Docker, Google Gemini API, Terraform
Provided technical consulting and greenfield development for AVIATAR 2.0
Designed data pipelines and data architecture (DataVerse)
Hands-on development of end-to-end data pipelines in PySpark Structured Streaming and Databricks
Automated documentation for all data pipelines using Azure OpenAI LLM and Azure AI Search: transformed production PySpark scripts and SQL notebooks via RAG for full-text search, business logic checks, and Confluence docs
Optimized Delta Lake tables (liquid clustering, Photon)
Delivered data for AI and ML projects (predictive analytics)
Coordinated with the data platform team for infrastructure scaling and monitoring
Worked closely with stakeholders across Lufthansa Technik for requirements management
Built a testing framework in PySpark and automated release pipelines with Azure DevOps
Project: Greenfield development of DataVerse as a digital twin for real-time parts logistics monitoring
Business goal: Real-time insights, reporting, and stock measurement worldwide
Tools and methods: Databricks, Azure Dedicated SQL, Azure Synapse Analytics, Azure AI Search, Azure OpenAI Service (GPT 4o), Azure EventHub, Azure DevOps CI/CD, PySpark, Power BI, Microsoft Purview, Python, Kafka, Oracle, Testing, Jira, Confluence, Scrum
Migrated on-premise services to AWS
Developed Python mapping scripts to transform large test data sets from software and hardware tests
Built Spring Boot backends and React frontends for self-service tools
Converted database scripts from Oracle PL/SQL to PostgreSQL
Scripted with Terraform and deployed Docker containers
Project: Moved development environment to AWS with seamless transfer of historical XRay test data without loss
Business goal: Streamline IT and data landscape and cut costs
Tools and methods: AWS, Terraform, Java, Python, SQL, Docker, Kubernetes, Helm charts, GitLab, GitLab CI/CD, AWS EKS, AWS Glue, AWS Lambda, AWS CloudWatch, Spring Boot, Spring Data, Jira, Confluence, Scrum
Led a team of external contractors to expand the AWS setup and CI/CD pipelines with GitLab CI
Provisioned infrastructure with Terraform (IaC) and implemented EMR cluster auto-scaling in Python and Route 53 auto-DNS
Supported migration of on-premise MySQL databases to AWS RDS and a 10 TB DWH to AWS Aurora
Converted AD to AWS Identity Center and Airflow on-premise to AWS MWAA and AWS MSK
Designed and implemented transient clusters and Savings Plans to reduce AWS costs
Enhanced Spring Boot microservices for data queries, invoicing, and partner portal (Kubernetes, Docker)
Implemented KMS encryption (CMK)
Presented project progress and cost forecasts to C-level stakeholders
Project: Cloud transition and build of a scalable data platform with Airflow, Databricks Delta, Kafka, and AWS services following Scrum
Tools and methods: AWS, Terraform, Apache Airflow, Python, PySpark, MySQL, Docker, Kubernetes, Helm charts, GitLab, GitLab CI, Presto, EMR, Hadoop, Hive, AWS (S3, Aurora, MSK, MWAA, RDS, Lambda, Athena, Glue, CloudWatch), Grafana, Jira, Confluence, Scrum
Built data pipelines as Airflow DAGs in Python and PySpark
Advanced the data warehouse data model
Implemented KStreams and Spark Structured Streaming to process event streams from Kafka
Introduced Airflow as a custom Docker image
Set up a Kubernetes-based HA setup for Airflow with Grafana monitoring
Deployed Confluent Enterprise with Kafka Connect, partitioners, SMTs, and Schema Registry
Created a Python framework to support new ETL pipelines in Airflow
Developed Spring Boot microservices for data access (GDPR)
Designed a new data lake structure with data lineage and data governance
Migrated SQL Data Warehouse ETL from Talend to Airflow and Databricks Delta on AWS S3
Project: Analyzed and optimized event processing performance and migrated analytic functions to the cloud using Scrum
Tools and methods: Apache Airflow, Python, Java, MySQL, Apache Spark, Redis, Docker, Kubernetes, Helm charts, GitLab, GitLab CI, Confluent Enterprise, Presto, EMR, Hadoop, Hive, AWS (S3, Lambda, Athena, Glue, CloudWatch), Talend Big Data, Grafana, Jira, Scrum
Advised and supported on Apache Spark, hyperscaling, and multi-tenancy for streaming applications
Trained SAP teams on Spark Structured Streaming
Guided migration from Spring Boot microservices and Kafka to a streaming architecture
Helped design a scalable streaming aggregation engine for time series data
Coded data pipelines in Scala
Supported Java and Scala development for metering, fault tolerance, and tenant partitioning on AWS/Cloud Foundry
Project: Optimized a big data architecture for IoT time series data with a streaming Lambda architecture
Tools and methods: Scala, SAP HANA, Cassandra, Kafka, Avro, Spark, Cloud Foundry AWS, Amazon Kinesis, Amazon EMR, Hazelcast, Redis, Java
Built a streaming data solution with Apache Kafka and Spark as a PoC
Processed and enriched credit card transactions from TSYS/Seeburger via Kafka and Spark
Designed HBase tables and row key design and architecture for distributed components
Implemented duplicate flagging and transaction tracking for service agents
Delivered enriched data to SAP systems
Developed Spring Boot microservices for electronic invoicing on Kubernetes and Azure
Project: Built an event-driven big data architecture with an enterprise data lake in Azure (Cloudera stack) using Scrum
Tools and methods: Cloudera Enterprise (Kafka, Hadoop, Spark), Azure Cloud, HBase, OpenShift, Spring Boot, Kubernetes, Helm charts, Docker, Maven, JUnit, Mockito, Hibernate, SAP, Oracle PL/SQL
Implemented IT security workflows and compliance for 2,000 users in Atlassian JIRA
Designed and built a pre-processing environment with Kafka and Spark (clustering, ML, NLTK) for heuristic analysis of security reports and automatic CVE assignment
Developed Java plugins for Atlassian JIRA to extend workflows and REST APIs
Designed and built Oracle 11g back-end databases for IT asset management and pre-processing
Project: Process implementation for security vulnerability handling in 10,000+ Deutsche Börse systems with CISO reporting using Scrum
Tools and methods: Atlassian JIRA, Java, Oracle 11g, PL/SQL, Python, Apache Spark, Kafka, ML, NLTK, Bash, Red Hat Linux, Apache Tomcat, Scrum, Kanban
Supported implementation of Atlassian Jira, Confluence, and DevTools for multiple clients
Administered client systems and continuous delivery pipelines with Jira, Bitbucket, and Bamboo
Developed plugins for Atlassian Stash for merge checks and build hooks
Implemented agile workflows in Atlassian Jira and Bitbucket (e.g., GitFlow)
Designed and built a quality assurance platform with Jenkins, GitBlit, Gerrit, and Groovy scripting
Defined a Git workflow based on GitFlow
Project: Atlassian product integration and development of a platform for distributed agile teams
Tools and methods: Atlassian Jira, Confluence, Bamboo, Stash, Java, Groovy, Bash, Maven, SUSE Linux
Discover other experts with similar qualifications and experience