Industries included marketing, retail, trade, automation & aviation.
Mathematical optimization for scheduling: The constraint-optimization problem was build for a flexible scheduling tool, as well as its interaction with the frontend. Role: Software Engineer & Applied Mathematician. Duration: 1 month. Team setting: Team of 2, remote. Technologies: JuMP, julia, Pluto, svelte, javascript, typescript, jetbrains space, terraform, nomad
Building scalable Data Science compute cluster from scratch: For the product cloud.jolin.io a data science compute cluster was build on top of kubernetes. This includes securing container runtimes, authorization of running jobs, autoscaling, integration with version control, and building of front-end to spawn individual scalable data science environments. Role: Software & Cloud & Web Engineer. Duration: 11 months. Team setting: Team of 1, on-site. Technologies: terraform, kubernetes, k8s ingress, k8s services, k8s rbac, k8s networking, k3s, etcd, s3, dns, certificates, julia, pluto, javascript, tailwind, astro, npm, parcel, preact, mui, jwt, aws sqs, aws rds, python, GitLab, GitHub
Custom ChatGPT service: A state of the art user-interface was build upon on generative AI, including backend and frontend. Role: AI & Web Engineer. Duration: 1 month. Team setting: Team of 2, remote. Technologies: python, poetry, langchain, tailwind, chatgpt api, flask, fastapi, tailwind
Central datalake setup and ingestion: The client had several different data sources for customer and orders, which should be centralized for several usecases. As a solution a big data lake on top of AWS S3 and Apache Hudi was setup and first data ingestions pipelines completed. In production. Role: Architect & Data Engineer. Duration: 9 months. Team setting: Team of 5, remote. Technologies: Infrastructure-as-code, aws cdk, python, boto3, PySpark, AWS Glue, IAM, S3, ECS, Fargate, Lambda, Apache Hudi, DeltaLake, Databricks, GitHub, Jira, Miro
PoC Julia migration of scikit-decide: Scikit-decide is an open source tool by Airbus for reinforcement learning and scheduling. The core was translated from Python to Julia to demonstrate feasibility and benchmark performance improvements. Role: Software Engineer. Duration: 1 month. Team setting: Team of 2, remote. Technologies: python, julia, GitHub
Industries included automotive.
Supporting Usecase Development on Datalake: Guidance was provided for architectural decisions, adapting access policies, and debugging routing issues. A specific GDPR treatment ingestion processes was implemented and rolled-out. In production. Role: Lead Developer & Architect. Duration: 6 months. Team setting: Team Lead, Team of 2, remote. Technologies: Infrastructure-as-code, cloudformation, sceptre, python, boto3, PySpark, scala, Spark, AWS Glue, AWS Secrets, AWS IAM, S3, SNS, Kubernetes, AWS VPC, AWS Networking, GitHub, Jira
20 ETL Pipelines on AWS: Replacing an CRM required the development of about 20 ETL pipelines to replace existing systems with new data-flows. Including one REST API. In production. Role: Lead Developer & Architect. Duration: 10 months. Team setting: Team Lead, Team of 3, remote. Technologies: AWS Glue, PySpark, python, boto3, pandas, AWS SNS, AWS SQS, SQL, MySQL, PostgreSQL, MongoDB, AWS DocumentDB, Salesforce, AWS API Gateway, AWS Cognito, AWS Lambda, infrastructure-as-code, cloudformation, sceptre, GitHub, Jira
Building Multitenant Datalake on AWS: Implementing from scratch a datalake platform on AWS which is deployed in several countries using InfrastructureAsCode as the key technology. A key focus was GDPR conformity. In production. Role: Lead Developer & Architect. Duration: 5 months. Team setting: Team Lead, Team of 2, remote with a few on-side workshops. Technologies: Infrastructure-as-code, cloudformation, sceptre, python, boto3, PySpark, scala, Spark, AWS SageMaker, AWS Glue, AWS Secrets, AWS IAM, S3, SNS, Lambda, Kubernetes, EKS, Kafka, MSK, AWS VPC, AWS Transit Gateway, AWS Networking, AWS EC2, AWS Session Manager, AWS CloudWatch, AWS Sagemaker, GitHub, Jira
Industries included loyalty program, telecommunication and clothing/accessories.
Unification of Existing Time Series Analytics: Several custom anomaly detection solutions on time series were refactored and unified into a generic framework which can be easily deployed to new usecases and new infrastructures (AWS tested). In production. Role: Core Developer. Duration: 9 months. Team setting: Team of 15, on-site, Scrum. Technologies: Python, PySpark, (PL)SQL, Hive, HBase, Oracle, Tableau, Nifi, Kubernetes, Docker, Azure, Gitlab
Recommender System: Designed, implemented, and deployed Big Data recommendation system, now running in production for Millions of daily customers. In production. Role: Data Science Developer. Duration: 7 months. Team setting: Team of 1, on-site, weekly reviews. Technologies: On-premise, R, Scala, SBT, Spark, Yarn, HDFS, Bitbucket, Jira, Grafana, Prometheus, Elastic Stack, Kibana
Q Review: Custom Datascience Framework: Infrastructure review and code review of a framework implemented build by one of our customers. Role: Quality Assurance & Adviser. Duration: 2 months. Team setting: Team of 1, mixed remote & on-site. Technologies: R, AWS
Workshop: Developing with Apache Spark: Four one-day workshops at customers, two introductory, the other two advanced. Contents: Performance optimization, monitoring, interfacing Scala-R-Python, best practices. Role: Teacher. Duration: 4x 1 day. Setting: Group of 15 persons, sole presenter. Technologies: R, Python, Spark
Industries included telecommunication, bonus program, and media.
Fraud Detection: Draft, development, implementation, evaluation and deployment of an anomaly detection system to detect previously unkown types of fraud. Role: Data Science Developer. Duration: 14 months. Team Setting: Team of 1, on-site, review once every three months. Technologies: R, Scala, Spark, Yarn, Bitbucket, Jira, Elastic Stack, Kibana
Callcenter and Webcontent Optimization using Speech Analytics: A 3 dimensional content detection system was setup for written conversations. Given only plain text, it identifyed customer specific product entities, services, and problems. Role: Data Science Developer. Duration: 6 month. Team Setting: Team of 3, on-site, reviews every week. Technologies: Python, NLP, spacy, GitHub, Elastic Stack, Kibana
Industries included municipal utilities, with forecasts of all kinds.
Web Visualization: Build Django based web-dashboard with Bokeh based interactive data analysis visualization. Role: Web Developer. Duration: 4 month. Team Setting: Team of 1, remote, steady exchange with CEO. Technologies: Python, Django, Bokeh, Gitlab
Data Parsing: Build parser to extract time series data from customer specific text data formats. Role: Python Developer. Duration: 8 month. Team Setting: Team of 1, remote, steady exchange with CEO. Technologies: Python, PyParsing, Cython, Gitlab
Building an Autonomous Robot: Programmed robot with wheels and arms to grab a muffin from the receptionist on first floor, take the elevator, and bring it to the robotics lab. Role: Computer vision & Object recognition. Duration: 12 month. Team Setting: Team of 14, on-site, Scrum. Technologies: ROS, Gazebo, Python, C++, OpenCV, Git