Stephan Sahm

Senior ML/Data/Cloud Engineer

Stephan Sahm
München, Germany

Experience

Oct 2021 - Present
3 years 5 months
Munich, Germany

Senior Data/ML Consultant & Technical Lead

Jolin.io

Industries included marketing, retail, trade, automation & aviation.

  • Mathematical optimization for scheduling: The constraint-optimization problem was build for a flexible scheduling tool, as well as its interaction with the frontend. Role: Software Engineer & Applied Mathematician. Duration: 1 month. Team setting: Team of 2, remote. Technologies: JuMP, julia, Pluto, svelte, javascript, typescript, jetbrains space, terraform, nomad

  • Building scalable Data Science compute cluster from scratch: For the product cloud.jolin.io a data science compute cluster was build on top of kubernetes. This includes securing container runtimes, authorization of running jobs, autoscaling, integration with version control, and building of front-end to spawn individual scalable data science environments. Role: Software & Cloud & Web Engineer. Duration: 11 months. Team setting: Team of 1, on-site. Technologies: terraform, kubernetes, k8s ingress, k8s services, k8s rbac, k8s networking, k3s, etcd, s3, dns, certificates, julia, pluto, javascript, tailwind, astro, npm, parcel, preact, mui, jwt, aws sqs, aws rds, python, GitLab, GitHub

  • Custom ChatGPT service: A state of the art user-interface was build upon on generative AI, including backend and frontend. Role: AI & Web Engineer. Duration: 1 month. Team setting: Team of 2, remote. Technologies: python, poetry, langchain, tailwind, chatgpt api, flask, fastapi, tailwind

  • Central datalake setup and ingestion: The client had several different data sources for customer and orders, which should be centralized for several usecases. As a solution a big data lake on top of AWS S3 and Apache Hudi was setup and first data ingestions pipelines completed. In production. Role: Architect & Data Engineer. Duration: 9 months. Team setting: Team of 5, remote. Technologies: Infrastructure-as-code, aws cdk, python, boto3, PySpark, AWS Glue, IAM, S3, ECS, Fargate, Lambda, Apache Hudi, DeltaLake, Databricks, GitHub, Jira, Miro

  • PoC Julia migration of scikit-decide: Scikit-decide is an open source tool by Airbus for reinforcement learning and scheduling. The core was translated from Python to Julia to demonstrate feasibility and benchmark performance improvements. Role: Software Engineer. Duration: 1 month. Team setting: Team of 2, remote. Technologies: python, julia, GitHub

Jan 2019 - Mar 2021
2 years 3 months
Munich, Germany

Senior Data Science Consultant & Technical Lead

Machine Learning Reply

Industries included automotive.

  • Supporting Usecase Development on Datalake: Guidance was provided for architectural decisions, adapting access policies, and debugging routing issues. A specific GDPR treatment ingestion processes was implemented and rolled-out. In production. Role: Lead Developer & Architect. Duration: 6 months. Team setting: Team Lead, Team of 2, remote. Technologies: Infrastructure-as-code, cloudformation, sceptre, python, boto3, PySpark, scala, Spark, AWS Glue, AWS Secrets, AWS IAM, S3, SNS, Kubernetes, AWS VPC, AWS Networking, GitHub, Jira

  • 20 ETL Pipelines on AWS: Replacing an CRM required the development of about 20 ETL pipelines to replace existing systems with new data-flows. Including one REST API. In production. Role: Lead Developer & Architect. Duration: 10 months. Team setting: Team Lead, Team of 3, remote. Technologies: AWS Glue, PySpark, python, boto3, pandas, AWS SNS, AWS SQS, SQL, MySQL, PostgreSQL, MongoDB, AWS DocumentDB, Salesforce, AWS API Gateway, AWS Cognito, AWS Lambda, infrastructure-as-code, cloudformation, sceptre, GitHub, Jira

  • Building Multitenant Datalake on AWS: Implementing from scratch a datalake platform on AWS which is deployed in several countries using InfrastructureAsCode as the key technology. A key focus was GDPR conformity. In production. Role: Lead Developer & Architect. Duration: 5 months. Team setting: Team Lead, Team of 2, remote with a few on-side workshops. Technologies: Infrastructure-as-code, cloudformation, sceptre, python, boto3, PySpark, scala, Spark, AWS SageMaker, AWS Glue, AWS Secrets, AWS IAM, S3, SNS, Lambda, Kubernetes, EKS, Kafka, MSK, AWS VPC, AWS Transit Gateway, AWS Networking, AWS EC2, AWS Session Manager, AWS CloudWatch, AWS Sagemaker, GitHub, Jira

Jul 2018 - Dec 2020
1 year 6 months
Munich, Germany

Senior Data Science & Engineering Consultant

Data Reply

Industries included loyalty program, telecommunication and clothing/accessories.

  • Unification of Existing Time Series Analytics: Several custom anomaly detection solutions on time series were refactored and unified into a generic framework which can be easily deployed to new usecases and new infrastructures (AWS tested). In production. Role: Core Developer. Duration: 9 months. Team setting: Team of 15, on-site, Scrum. Technologies: Python, PySpark, (PL)SQL, Hive, HBase, Oracle, Tableau, Nifi, Kubernetes, Docker, Azure, Gitlab

  • Recommender System: Designed, implemented, and deployed Big Data recommendation system, now running in production for Millions of daily customers. In production. Role: Data Science Developer. Duration: 7 months. Team setting: Team of 1, on-site, weekly reviews. Technologies: On-premise, R, Scala, SBT, Spark, Yarn, HDFS, Bitbucket, Jira, Grafana, Prometheus, Elastic Stack, Kibana

  • Q Review: Custom Datascience Framework: Infrastructure review and code review of a framework implemented build by one of our customers. Role: Quality Assurance & Adviser. Duration: 2 months. Team setting: Team of 1, mixed remote & on-site. Technologies: R, AWS

  • Workshop: Developing with Apache Spark: Four one-day workshops at customers, two introductory, the other two advanced. Contents: Performance optimization, monitoring, interfacing Scala-R-Python, best practices. Role: Teacher. Duration: 4x 1 day. Setting: Group of 15 persons, sole presenter. Technologies: R, Python, Spark

Nov 2016 - Jul 2018
1 year 9 months
Munich, Germany

Data Science Consultant

Data Reply

Industries included telecommunication, bonus program, and media.

  • Fraud Detection: Draft, development, implementation, evaluation and deployment of an anomaly detection system to detect previously unkown types of fraud. Role: Data Science Developer. Duration: 14 months. Team Setting: Team of 1, on-site, review once every three months. Technologies: R, Scala, Spark, Yarn, Bitbucket, Jira, Elastic Stack, Kibana

  • Callcenter and Webcontent Optimization using Speech Analytics: A 3 dimensional content detection system was setup for written conversations. Given only plain text, it identifyed customer specific product entities, services, and problems. Role: Data Science Developer. Duration: 6 month. Team Setting: Team of 3, on-site, reviews every week. Technologies: Python, NLP, spacy, GitHub, Elastic Stack, Kibana

Sep 2015 - Sep 2016
1 year 1 month
Nijmegen, Netherlands

Student Employee

Trufflebit

Industries included municipal utilities, with forecasts of all kinds.

  • Web Visualization: Build Django based web-dashboard with Bokeh based interactive data analysis visualization. Role: Web Developer. Duration: 4 month. Team Setting: Team of 1, remote, steady exchange with CEO. Technologies: Python, Django, Bokeh, Gitlab

  • Data Parsing: Build parser to extract time series data from customer specific text data formats. Role: Python Developer. Duration: 8 month. Team Setting: Team of 1, remote, steady exchange with CEO. Technologies: Python, PyParsing, Cython, Gitlab

Apr 2013 - Mar 2014
1 year
Osnabrück, Germany

Study Project

University of Osnabrück

Building an Autonomous Robot: Programmed robot with wheels and arms to grab a muffin from the receptionist on first floor, take the elevator, and bring it to the robotics lab. Role: Computer vision & Object recognition. Duration: 12 month. Team Setting: Team of 14, on-site, Scrum. Technologies: ROS, Gazebo, Python, C++, OpenCV, Git

Summary

Stephan Sahm is a full stack lead data science consultant and cloud architect, specialized on big data, high performance, probabilistic computing and scientific machine learning. Stephan Sahm brings 10+ years experience in data science, 7+ years in consulting, 2 years in leading small teams. He has an outstanding master of cognitive science, master of statistics, and bachelor of informatics. Since 2017 he worked with Big Data systems, since 2018 on clouds. Stephan Sahm architected cloud-based data solutions for several businesses with AWS, Azure and Kubernetes. He implemented several machine learning use cases in production using Python, R, Scala, and Julia. He worked for several big companies, like Adidas, Fielmann, O2, Payback, Telefonica, VW.

Languages

German
Native
English
Advanced
Spanish
Advanced
Dutch
Elementary

Education

Sep 2014 - Sep 2016

Radboud University

M.Sc. · Applied Stochastics · Nijmegen, Netherlands

Oct 2013 - Aug 2015

University of Osnabrück

M.Sc. · Cognitive Science · Osnabrück, Germany

Oct 2011 - Dec 2015

University of Osnabrück

B.Sc. · Mathematik/Informatik · Osnabrück, Germany

Certifications & licenses

AWS Certified Architect Professional

AWS Certified Architect Associate

AWS Certified Machine Learning Speciality

AWS Certified Big Data Speciality

AWS Certified Cloud Practitioneer

Cloudera Certified CCA Spark and Hadoop Developer

Professional Scrum Master™ I (PSM I)