In a proactive phase focused on enhancing my skills and knowledge, I dedicated my time effectively during a break from client projects:
Certified Cloud Practitioner: Attained certification as an AWS Certified Cloud Practitioner, validating foundational cloud knowledge and proficiency in AWS services.
New Programming Language: Mastered the foundations of Rust as new programming language through intensive self-study, including reading books, completing online courses, and contributing to open-source projects.
Serverless Web Application (In Development): Development of a serverless web application using AWS and TypeScript/React in combination with D3 as part of a personal project this year. Although the application is still undergoing continuous development, I am prepared to provide a comprehensive demo or walkthrough to showcase its current functionalities and future potential. AWS services used include RDS, Lambda, Polly and Amplify. CDK is used for infrastructure management. This ongoing project has provided valuable hands-on experience in architecting serverless applications on AWS cloud infrastructure. Throughout the development process, I have gained proficiency in integrating AWS services such as RDS (Relational Database Service) for efficient database management, Lambda for serverless compute, Polly for text-to-speech functionality, and Amplify for seamless deployment and hosting.
Continuous Learning: I consistently stay updated with industry trends and best practices through self-education, webinars, and online workshops. Recently, I delved into the Data Mesh approach and explored FastAPI.
As a data engineer, I developed the replacement of existing clustering batch jobs for a software company belonging to a leading German multinational automotive manufacturer. This clustering uses location and additional metadata from worldwide distributed electric charging stations.
As part of the company's team responsible for POI data management (POI=point of interest), I collaborated closely with the developers who previously worked on this task and utilised my expertise with Spark and Airflow to design and implement the new solution using Databricks on Azure.
In collaboration with another team member, I worked on refactoring the existing Airflow pipeline for the batch jobs and the grown and complex business logic. We also refactored the library code used for the clustering and preprocessing steps. While doing so, I added missing documentation and improved code quality by using techniques from defensive programming.
Based on the acquired domain knowledge and business requirements for the clustering, I built a new solution in Azure Cloud using Databricks. Tasks involved integrating existing data in AWS S3 and Azure blob storage, development of library code, and Spark job for geospatial clustering of charging station data.
While the team mainly works with Python, the new solution also uses the official open-source Scala library for graph clustering. For this, I used my experience with Scala and JVM-based development to make the library callable from Python using a suitable wrapper class.
At the beginning of the project, I also worked with the developers and testing team to eliminate bugs and increase the test coverage for an event-driven service. This Azure Functions-based service detects and removes certain privacy-related information within streams of vehicle signals. Redis is used to cache intermediary results detected in the event streams.
Other tasks: Contribution to code reviews, PI planning, testing, and documentation.
Skills and Technologies: Programming Languages: Python, Scala, SQL Build and Dependency Management: Poetry, SBT Data Processing and Analysis: PySpark, Spark, GraphX Cloud Services: Azure (Databricks, Blob Storage, Functions) and AWS (S3) Workflow Orchestration: Airflow In-memory Data Store: Redis Version Control and CI/CD: Git, Jenkins, Bitbucket
As a Data Engineer, I managed data and machine learning models for automated customer relationship management. Our cross-functional team, including data scientists and engineers, operated under SaFe. The team's endeavors were instrumental in delivering valuable data insights and predictive models crucial for executing multi-message campaigns and CRM initiatives, supporting a user base of around 7 million.
Designing and constructing ETL pipelines for contract and usage data, and generating features for machine learning models.
Implementation of machine learning pipelines in the cloud supporting artificial intelligence use cases for churn scoring, next best actions, and customer behaviour prediction.
Collaborating on the transition of Data Scientist-developed models from experimentation to production.
Automating data exports and score transfers into the central event bus for consumption by other services.
Established data quality monitoring using Great Expectations to ensure the accuracy, completeness, and consistency of incoming data.
Collaborating with a fellow engineer during a spike to provide guidance for later steps in facilitating MLOps, including assessing workflows, repositories, and researching feature store integration.
Skills and Technologies: Programming Languages: Python, SQL Data Processing and Analysis: Pandas, PySpark, NumPy Machine Learning Libraries: Scikit-Learn, CatBoost Data Quality Monitoring: Great Expectations Cloud Services: AWS (S3, Kinesis, Athena, EMR, Glue) Workflow Orchestration: Airflow Containerization and Orchestration: Docker, Kubernetes Infrastructure as Code: Terraform Version Control and CI/CD: Git, GitLab
Supported the client's data engineering team with the conceptual and architectural preparation of the extraction of data from an external API to their AWS-based Big Data Lake and the Redshift data warehouse.
The API belongs to a SaaS platform used for campaign management and customer analytics, involving topics from Natural Language Processing (NLP) including for example sentiment analysis and phrase as well as keyword detection within customer text comments. Activities included:
Gathering requirements from project stakeholders and requirement analysis.
Clarification of issues related to Natural Language Processing and the API design with a contact person from the SaaS platform and stakeholders.
Extension of documentation in Confluence.
Preparation of logical data model and conceptual design of the ETL processing pipeline.
Preparation of proof of concept for streaming data from Kafka cluster to S3 layer in AWS cloud using Databrick's Delta Lake framework.
Skills and Technologies: Programming Languages: Scala, SQL Big Data Processing and Analytics: Spark, Kafka, Delta Lake, Redshift, Natural Language Processing (NLP) Cloud Services: AWS (S3)
Responsible as a software developer for the development of an AWS-based cloud data lake divided into several zones, which is also used for a downstream data mart in Redshift used by end-users for analytics and queries via Tableau. Activities included:
Used Apache Nifi during the initial proof of concept phase for the extraction from source systems to S3 and Kinesis.
Developed several software services with Scala and Spark for automatic extraction and transformation (ETL) of different source systems/databases to and within the AWS cloud, respectively.
Implemented transformation logic for the creation of fact and dimension tables (data modelling following Kimball).
Implementation of ETL pipelines using Spark and Scala, based on specifications and blueprints from the data analytics department and already existing Tableau Prep Flows.
Development, adjustment, and deployment of programmatic workflows for scheduling Spark jobs on EMR clusters via Apache Airflow.
Transfer of knowledge and mentoring regarding Scala and Spark to coworkers during team programming.
Development of a custom mini-framework using Scala for working in a type-safe fashion with Spark data frames (worked together via team programming for this). The objective of this framework is to facilitate the development and testing of transformation components used in Spark jobs within ETL pipelines.
Integration and replacement of new and old data sources, respectively.
Development of integration and unit tests, debugging, and execution tests.
Skills and Technologies: Programming Languages: Python, Scala, SQL Build and Dependency Management: Poetry, SBT Data Processing and Analysis: PySpark, Spark, GraphX Cloud Services: AWS (EMR, S3, SSM, Kinesis, Redshift) Data Integration and Workflow Orchestration: Airflow, NiFi Databases: MS SQL Server, SAP HANA Containerization and Orchestration: Docker, Kubernetes Version Control and CI/CD: GitHub, Jenkins, Bitbucket Artifact Management: JFrog Artifactory
As a freelancer, I build individual solutions with a focus on Data and Machine Learning Engineering.
Responsible as a software developer for the development of an AWS-based cloud data lake divided into several zones, which is also used for a downstream data mart in Redshift used by end-users for analytics and queries via Tableau. Activities included:
Used Apache Nifi during the initial proof of concept phase for the extraction from source systems to S3 and Kinesis.
Developed several software services with Scala and Spark for automatic extraction and transformation (ETL) of different source systems/databases to and within the AWS cloud, respectively.
Implemented transformation logic for the creation of fact and dimension tables (data modelling following Kimball).
Implementation of ETL pipelines using Spark and Scala, based on specifications and blueprints from the data analytics department and already existing Tableau Prep Flows.
Development, adjustment, and deployment of programmatic workflows for scheduling Spark jobs on EMR clusters via Apache Airflow.
Transfer of knowledge and mentoring regarding Scala and Spark to coworkers during team programming.
Development of a custom mini-framework using Scala for working in a type-safe fashion with Spark data frames (worked together via team programming for this). The objective of this framework is to facilitate the development and testing of transformation components used in Spark jobs within ETL pipelines.
Integration and replacement of new and old data sources, respectively.
Development of integration and unit tests, debugging, and execution tests.
Skills and Technologies: Programming Languages: Python, Scala, SQL Build and Dependency Management: Poetry, SBT Data Processing and Analysis: PySpark, Spark, GraphX Cloud Services: AWS (EMR, S3, SSM, Kinesis, Redshift) Data Integration and Workflow Orchestration: Airflow, NiFi Databases: MS SQL Server, SAP HANA Containerization and Orchestration: Docker, Kubernetes Version Control and CI/CD: GitHub, Jenkins, Bitbucket Artifact Management: JFrog Artifactory
codecentric AG is a German IT and services provider with 15 locations in Germany and other European countries headquartered in Solingen. The company develops custom software solutions for its clients and employs around 450+ experts in the area of agile software development.
At codecentric, I worked as an engineer in a business unit focused on Data Science and Machine Learning and contributed to related customer and in-house projects.
DIGITEC GmbH is a German software provider based in Hamburg. It develops, sells, and supports computer software for clients in the finance and banking industry used for trading in the money and foreign exchange markets.
As a developer in a Scrum team, I implemented features for the upcoming version of the company's flagship desktop software suite.
Entdecken Sie andere Experten mit ähnlichen Qualifikationen und Erfahrungen.