Stephan Sahm - Senior Data/ML Consultant & Technical Lead

Munich, Germany

Experience

Oct 2021 - Present

3 years 10 months

Munich, Germany

Senior Data/ML Consultant & Technical Lead

Jolin.io

Industries included marketing, retail, trade, automation & aviation.
Mathematical optimization for scheduling: The constraint-optimization problem was build for a flexible scheduling tool, as well as its interaction with the frontend. Role: Software Engineer & Applied Mathematician. Duration: 1 months. Team setting: Team of 2, remote. Technologies: JuMP, julia, Pluto, svelte, javascript, typescript, jetbrains space, terraform, nomad.
Building scalable Data Science compute cluster from scratch: For the product cloud.jolin.io a data science compute cluster was build on top of kubernetes. This includes securing container runtimes, authorization of running jobs, autoscaling, integration with version control, and building of front-end to spawn individual scalable data science environments. Role: Software & Cloud & Web Engineer. Duration: 11 months. Team setting: Team of 1, on-site. Technologies: terraform, kubernetes, k8s ingress, k8s services, k8s rbac, k8s networking, k3s, etcd, s3, dns, certificates, julia, pluto, javascript, tailwind, astro, npm, parcel, preact, mui, jwt, aws sqs, aws rds, python, GitLab, GitHub.
Custom ChatGPT service: A state of the art user-interface was build upon on generative Al, including backend and frontend. Role: Al & Web Engineer. Duration: 1 months. Team setting: Team of 2, remote. Technologies: python, poetry, langchain, tailwind, chatgpt api, flask, fastapi, tailwind.
Central datalake setup and ingestion: The client had several different data sources for customer and orders, which should be centralized for several usecases. As a solution a big data lake on top of AWS S3 and Apache Hudi was setup and first data ingestions pipelines completed. In production. Role: Architect & Data Engineer. Duration: 9 months. Team setting: Team of 5, remote. Technologies: Infrastructure-as-code, aws cdk, python, boto3, PySpark, AWS Glue, IAM, S3, ECS, Fargate, Lambda, Apache Hudi, DeltaLake, Databricks, GitHub, Jira, Miro.
PoC Julia migration of scikit-decide: Scikit-decide is an open source tool by Airbus for reinforcement learning and scheduling. The core was translated from Python to Julia to demonstrate feasibility and benchmark performance improvements. Role: Software Engineer. Duration: 1 months. Team setting: Team of 2, remote. Technologies: python, julia, GitHub.

Jan 2019 - Mar 2021

2 years 3 months

Munich, Germany

Senior Data Science Consultant & Technical Lead

Machine Learning Reply

Industries included automotive.
Technical Lead in several projects and lead for preparation and execution of sales meetings and pitches.
Preparation and execution of workshops and courses around technical topics as well as being speaker at conferences and webinars.
Leading and contributing to RfPs, especially assessing the scope and resource requirements.
Ideation and conceptual planning of data driven solutions incl. setting up (cloud) architectures.
Implementation of ETL pipelines.
Conceptual and theoretical development of machine learning models for a given use cases and targets.
Implementation and Evaluation of Machine Learning Models using state-of-the-art and especially open source tools.
Setting Machine Learning Models into production for batch processes as well as real-time processes, especially for big-data.
Holding recruiting interviews and decisions.
Supporting Usecase Development on Datalake: Guidance was provided for architectural decisions, adapting access policies, and debugging routing issues. A specific GDPR treatment ingestion processes was implemented and rolled-out. In production. Role: Lead Developer & Architect. Duration: 6 months. Team setting: Team Lead, Team of 2, remote. Technologies: Infrastructure-as-code, cloudformation, sceptre, python, boto3, PySpark, scala, Spark, AWS Glue, AWS Secrets, AWS IAM, S3, SNS, Kubernetes, AWS VPC, AWS Networking, GitHub, Jira.
20 ETL Pipelines on AWS: Replacing an CRM required the development of about 20 ETL pipelines to replace existing systems with new data-flows. Including one REST API. In production. Role: Lead Developer & Architect. Duration: 10 months. Team setting: Team Lead, Team of 3, remote. Technologies: AWS Glue, PySpark, python, boto3, pandas, AWS SNS, AWS SQS, SQL, MySQL, PostgreSQL, MongoDB, AWS DocumentDB, Salesforce, AWS API Gateway, AWS Cognito, AWS Lambda, infrastructure-as-code, cloudformation, sceptre, GitHub, Jira.
Building Multitenant Datalake on AWS: Implementing from scratch a datalake platform on AWS which is deployed in several countries using InfrastructureAsCode as the key technology. A key focus was GDPR conformity. In production. Role: Lead Developer & Architect. Duration: 5 months. Team setting: Team Lead, Team of 2, remote with a few on-side workshops. Technologies: Infrastructure-as-code, cloudformation, sceptre, python, boto3, PySpark, scala, Spark, AWS SageMaker, AWS Glue, AWS Secrets, AWS IAM, S3, SNS, Lambda, Kubernetes, EKS, Kafka, MSK, AWS VPC, AWS Transit Gateway, AWS Networking, AWS EC2, AWS Session Manager, AWS CloudWatch, AWS Sagemaker, GitHub, Jira.

Jul 2018 - Dec 2020

1 year 6 months

Munich, Germany

Senior Data Science & Engineering Consultant

Data Reply

Industries included loyalty program, telecommunication and clothing/accessories.
Unification of Existing Time Series Analytics: Several custom anomaly detection solutions on time series were refactored and unified into a generic framework which can be easily deployed to new usecases and new infrastructures (AWS tested). In production. Role: Core Developer. Duration: 9 months. Team setting: Team of 15, on-site, Scrum. Technologies: Python, PySpark, (PL)SQL, Hive, HBase, Oracle, Tableau, Nifi, Kubernetes, Docker, Azure, Gitlab.
Recommender System: Designed, implemented, and deployed Big Data recommendation system, now running in production for Millions of daily customers. In production. Role: Data Science Developer. Duration: 7 months. Team setting: Team of 1, on-site, weekly reviews. Technologies: On-premise, R, Scala, SBT, Spark, Yarn, HDFS, Bitbucket, Jira, Grafana, Prometheus, Elastic Stack, Kibana.
Q Review: Custom Datascience Framework: Infrastructure review and code review of a framework implemented build by one of our customers. Role: Quality Assurance & Adviser. Duration: 2 months. Team setting: Team of 1, mixed remote & on-site. Technologies: R, AWS.
Workshop: Developing with Apache Spark: Four one-day workshops at customers, two introductory, the other two advanced. Contents: Performance optimization, monitoring, interfacing Scala-R-Python, best practices. Role: Teacher. Duration: 4x 1 day. Setting: Group of 15 persons, sole presenter. Technologies: R, Python, Spark.

Nov 2016 - Jul 2018

1 year 9 months

Munich, Germany

Data Science Consultant

Data Reply

Industries included telecommunication, bonus program, and media.
Fraud Detection: Draft, development, implementation, evaluation and deployment of an anomaly detection system to detect previously unkown types of fraud. Role: Data Science Developer. Duration: 14 months. Team Setting: Team of 1, on-site, review once every three months. Technologies: R, Scala, Spark, Yarn, Bitbucket, Jira, Elastic Stack, Kibana.
Callcenter and Webcontent Optimization using Speech Analytics: A 3 dimensional content detection system was setup for written conversations. Given only plain text, it identifyed customer specific product entities, services, and problems. Role: Data Science Developer. Duration: 6 month. Team Setting: Team of 3, on-site, reviews every week. Technologies: Python, NLP, spacy, GitHub, Elastic Stack, Kibana.

Sep 2015 - Sep 2016

1 year 1 month

Nijmegen, Netherlands

Student Employee

Trufflebit

Industries included municipal utilities, with forecasts of all kinds.
Web Visualization: Build Django based web-dashboard with Bokeh based interactive data analysis visualization. Role: Web Developer. Duration: 4 month. Team Setting: Team of 1, remote, steady exchange with CEO. Technologies: Python, Django, Bokeh, Gitlab.
Data Parsing: Build parser to extract time series data from customer specific text data formats. Role: Python Developer. Duration: 8 month. Team Setting: Team of 1, remote, steady exchange with CEO. Technologies: Python, PyParsing, Cython, Gitlab.

Apr 2013 - Mar 2014

1 year

Osnabrück, Germany

Computer vision & Object recognition

University of Osnabrück

Study Project: Building an Autonomous Robot.
Programmed robot with wheels and arms to grab a muffin from the receptionist on first floor, take the elevator, and bring it to the robotics lab.
Duration: 12 month.
Team Setting: Team of 14, on-site, Scrum.
Technologies: ROS, Gazebo, Python, C++, OpenCV, Git.

Summary

Stephan Sahm is a full stack lead data science consultant and cloud architect, specialized on big data, high performance, probabilistic computing and scientific machine learning.

Stephan Sahm brings 10+ years experience in data science, 7+ years in consulting, 2 years in leading small teams. Since 2017 he worked with Big Data systems, since 2018 on clouds. Stephan Sahm architected cloud-based data solutions for several businesses with AWS, Azure and Kubernetes. He implemented several machine learning use cases in production using Python, R, Scala, and Julia.

Languages

German

Native

English

Advanced

Spanish

Advanced

Dutch

Elementary

Education

Sep 2014 - Sep 2016

Radboud University

Applied Stochastics, probability theory and Bayesian statistics · Mathematics · Nijmegen, Netherlands · very good

Oct 2013 - Aug 2015

University of Osnabrück

Cognitive Science, computational neuroscience, symbolic AI, robotics and philosophy of mind · Cognitive Science · Osnabrück, Germany · outstanding

Oct 2011 - Dec 2015

University of Osnabrück

Mathematics and Computer Science · Osnabrück, Germany · outstanding

Certifications & licenses

AWS Certified Architect Professional

AWS

AWS Certified Architect Associate

AWS

AWS Certified Machine Learning Speciality

AWS

AWS Certified Big Data Speciality

AWS

AWS Certified Cloud Practitioneer

AWS

Cloudera Certified CCA Spark and Hadoop Developer

Cloudera

Professional Scrum Master™ I (PSM I)

Similar Freelancers

Discover other experts with similar qualifications and experience

Experience

Senior Data/ML Consultant & Technical Lead

Jolin.io

Senior Data Science Consultant & Technical Lead

Machine Learning Reply

Senior Data Science & Engineering Consultant

Data Reply

Data Science Consultant

Data Reply

Student Employee

Trufflebit

Computer vision & Object recognition

University of Osnabrück

Summary

Languages

Education

Radboud University

Applied Stochastics, probability theory and Bayesian statistics · Mathematics · Nijmegen, Netherlands · very good

University of Osnabrück

Cognitive Science, computational neuroscience, symbolic AI, robotics and philosophy of mind · Cognitive Science · Osnabrück, Germany · outstanding

University of Osnabrück

Mathematics and Computer Science · Osnabrück, Germany · outstanding

Certifications & licenses

AWS Certified Architect Professional

AWS

AWS Certified Architect Associate

AWS

AWS Certified Machine Learning Speciality

AWS

AWS Certified Big Data Speciality

AWS

AWS Certified Cloud Practitioneer

AWS

Cloudera Certified CCA Spark and Hadoop Developer

Cloudera

Professional Scrum Master™ I (PSM I)

Similar Freelancers

Cloud Software Engineer

Freelance Data Architect

Data Scientist & AI Engineer & AI Architect

AR/VR/XR Architect

Freelancer, Solution Architect

Senior DevOps (external)

Technical Product Owner – AI & Data Platform on AWS

Data Scientist

Research for Master's Thesis

Aufbau eines modernen Gehaltssystems

Data Science Expert and AI Strategist

Tech Lead Customer Base Documentation Automation

Azure Cloud Solution Architect focusing on data lakehouse, data analytics, machine learning, and GenAI

Hiring Requirement

AI Engineer, Cloud Solution Architect, Backend Developer