Jorge M.

Data Architect

Würzburg, Germany

Experience

Mar 2025 - Present
6 months

Data Architect

Deutsche Bahn

  • I was responsible for the architecture of the new data processing system for the Finance division of DB on the cloud.
  • Used DBT, Dagster, Kubernetes, Glue, and a Databricks proof of concept.
  • Environment: Kubernetes, AWS.
  • Tools: DBT, Dagster, Redshift, AWS, Glue, S3, VS Code, Databricks.
  • Designed and provided best practices for data modeling in dbt.
  • Handled slowly changing dimensions, late-arriving data, and testing.
  • Designed the ingestion flow from other systems into S3 and Redshift.
  • Designed and implemented new partitions for Dagster and incremental loading with DBT.
  • Mapped business requirements to technical architectures.
  • Trained and guided junior developers.
Sep 2024 - Mar 2025
7 months

Data Architect expert

SAP AG

  • As a Data Architect, I designed and implemented a cloud-agnostic architecture to support Kafka Tiered Storage across multiple environments. This involved defining rollout workflows, automation pipelines, and infrastructure abstraction to ensure scalable, cost-effective, and maintainable data streaming capabilities.
  • Scaled a cluster to handle over 25,000 Kafka partitions.
  • Environment: Kubernetes, Azure, AWS, Google Cloud.
  • Tools: Kubernetes, Gardener, GitHub, Python, Go, Kafka, Jenkins, Helm.
  • Led the architectural design and implementation of Kafka Tiered Storage rollout across 30+ Kubernetes clusters in multi-cloud environments (Azure, AWS, GCP).
  • Defined and implemented infrastructure provisioning using Crossplane, enabling declarative and consistent infrastructure deployment across cloud providers.
  • Developed a custom Go-based Kafka Operator, abstracting platform complexity and standardizing the tiered storage activation process for data pipelines.
  • Designed and automated GitOps-based deployment strategies using Flux and Helm, ensuring safe and repeatable rollout procedures.
  • Optimized Gardener shoot configurations to align cluster resources with Kafka workload and cost efficiency requirements.
May 2024 - Nov 2024
7 months

Data Architect expert

s.Oliver GmbH

  • As the lead Data Architect, I was responsible for the end-to-end migration from SAP HANA to a modern, scalable Azure-based Databricks Lakehouse. This initiative involved redesigning the data architecture, implementing robust ETL pipelines, and introducing advanced analytics and AI capabilities — resulting in annual cost savings of over €50,000 by decommissioning legacy SAP infrastructure.
  • Environment: Databricks / Azure.
  • Tools & Technologies: Databricks, Azure Data Lake, SAP HANA, PySpark, DBT, Kafka, Azure DevOps, Delta Lake, Python, FP-Growth, Time Series Forecasting.
  • Designed a medallion architecture on Databricks to support scalable and modular data ingestion, transformation, and consumption.
  • Led the implementation of incremental ETL pipelines using PySpark to extract and process SAP data efficiently.
  • Architected and implemented DBT-based semantic layers, including dimensional modeling for fact and dimension tables.
  • Established Dev to Prod CI/CD pipelines to standardize deployment and enforce governance.
  • Defined role-based access control and security concepts aligned with enterprise Azure standards.
  • Enabled real-time data integration by connecting Kafka streams to Databricks, enriching analytical capabilities.
  • Introduced AI/ML use cases, including FP-Growth for basket analysis and time series forecasting models.
  • Mentored junior developers on Databricks best practices, ensuring long-term platform adoption and team scaling.
Jan 2023 - Aug 2023
8 months

Data Architect expert

ias Gruppe

  • As a Data Architect, I was responsible for designing and implementing a modern, scalable Azure-based Data Lake architecture to support real-time data ingestion and processing from IoT and telemetry sources. The platform was built to provide structured, analytics-ready data for multiple departments, supporting both operational dashboards and advanced analytics.
  • Environment: Azure.
  • Tools & Technologies: Azure Synapse, Delta Lake, Azure Data Lake Gen2, Azure IoT Hub, Azure Event Hub, Azure Service Bus, Azure Data Factory, DBT, Airbyte, Power BI, Azure Monitor, Log Analytics, Python, SQL.
  • Architected an end-to-end Azure Data Lakehouse solution leveraging Azure Synapse, Delta Lake, and Azure Data Lake Storage Gen2, ensuring scalable and performant storage and query capabilities.
  • Designed and implemented streaming ingestion pipelines using Azure IoT Hub, Azure Event Hub, and Azure Service Bus, enabling real-time telemetry data capture from thousands of IoT devices.
  • Developed data integration and transformation flows using Airbyte for ELT and DBT for business logic modeling, dimensional design, and lineage tracking.
  • Orchestrated complex data workflows using Azure Data Factory, integrating batch and streaming processes into a unified data pipeline.
  • Implemented Delta Lake-based time travel and ACID transactions to ensure reliability and traceability of business-critical data.
  • Designed role-based access control (RBAC), resource tagging strategies, and monitoring with Azure Monitor and Log Analytics, ensuring operational transparency and data security.
  • Enabled Power BI integration for stakeholders to explore data in near real-time and develop business dashboards.
  • Collaborated with product and operations teams to capture functional requirements and translate them into scalable data architecture patterns.
Sep 2022 - May 2024
1 year 9 months
Frankfurt, Germany

Data Architect expert

Deutsche Bahn

  • As Data Architect, I led the development of a large-scale, cloud-native data platform on AWS for processing streaming and batch data in the transportation domain. The architecture enabled real-time analytics and delta ingestion into a multi-hundred-terabyte data lake, optimizing operations around train delays, departures, and predictive insights.
  • Environment: AWS.
  • Tools & Technologies: AWS Kinesis, TimescaleDB, AWS Glue, Apache Hudi, Lambda, S3, DBT, PostgreSQL, CDK, GitLab, Spark, Athena, CloudWatch.
  • Designed and implemented real-time streaming architectures using AWS Kinesis, Lambda, and Apache Spark to support time-sensitive analytics use cases.
  • Architected delta ingestion pipelines on AWS Glue and Apache Hudi, enabling efficient small file compaction and time travel analytics at scale.
  • Delivered business-critical KPIs and dashboards, with end-to-end data lineage and auditability across S3, PostgreSQL, and CloudWatch.
  • Defined and enforced infrastructure-as-code (IaC) principles using AWS CDK, enabling scalable and replicable environments.
  • Introduced and rolled out DBT for semantic modeling and reusable business logic, integrating into CI/CD workflows with GitLab.
  • Conducted architectural evaluations of Databricks, Snowflake, and AWS Athena, providing decision support for future platform strategy.
  • Mentored a team of developers, optimizing development cycles and ensuring best practices in cloud data engineering.
  • Implemented IoT 4.0 pipelines for ingesting telemetry data and supporting predictive analytics initiatives.
Sep 2021 - Sep 2022
1 year 1 month
Rottendorf, Germany

Kafka Expert

S.Oliver GmbH

  • In this project we redesign the complete purchase Orders, Material chain to not be Batched but to go for real time with the usage of kafka.
  • Environment: Confluent Cloud and Azure.
  • Spring boot Kafka Streams Applications.
  • Developing custom Kafka source connectors to extract data from SAP Systems.
  • Developing custom Kafka sink connectors to write to SAP Systems.
  • Deploying Kafka connect connectors with monitoring into Azure Kubernetes cluster.
  • Developing Data pipelines using Airflow and Azure Cloud.
  • Developing the Architecture for the Data Pipelines between on premise and Azure Cloud.
  • Writing Spark jobs to clean and aggregate data.
Mar 2021 - Jun 2021
4 months
Würzburg, Germany

Datawarehouse expert

Büro Forum GmbH

  • Develop a Datawarehouse for the Concept office ERP System.
  • Environment: Google Big Query and DBT.
  • Developing dbt tool workflows and Stars Schemas for the Datawarehouse.
  • Developing ELT Workflows with Stich data.
  • Developing dashboards with PowerBI in Azure cloud.
Feb 2021 - Aug 2022
1 year 7 months

Software Developer

RTL Deutschland

  • In this project, I architected and delivered a highly complex and compliance-driven data sharing platform on Microsoft Azure. The solution enabled secure, governed, and scalable access to sensitive business data across departments and partners, supporting both analytical and operational use cases.
  • Environment: Microsoft Azure.
  • Tools & Technologies: Azure Databricks, Azure Synapse, Delta Lake, Azure App Services, DBT, FastAPI, PySpark, Power BI, Azure DevOps, Azure Monitor, Azure Key Vault, Python, SQL.
  • Designed and implemented a Lakehouse architecture combining Azure Databricks, Delta Lake, and Azure Synapse to support both batch and real-time workloads with ACID compliance and scalable performance.
  • Built RESTful data APIs using FastAPI and deployed them securely via Azure App Services, providing a controlled access layer to the data platform.
  • Developed incremental ETL pipelines using PySpark and DBT, implementing star schema models for semantic consistency, historical tracking, and governed self-service analytics.
  • Enabled interactive reporting and visual analytics using Power BI, directly integrated into the Azure ecosystem for performance and security compliance.
  • Implemented strong data access controls, audit logging, and resource monitoring to ensure compliance with GDPR and internal data governance policies.
  • Established automated deployment processes and CI/CD pipelines for data infrastructure components using Azure-native tooling.
Sep 2020 - Jun 2021
10 months
Munich, Germany

Cloud Solution Architect

Allianz Technology

  • Migration of Datalake’s into the Azure Cloud. High level of automation by means on ArgoCD, Jenkins, helmcharts and Terraform. Designing clients applications to be “cloud” native. Spark and azcopy was used to perform some parts of the migration.
  • Environment: Azure Cloud.
  • Technologies Used: Azure Blob Storage, Azure Kubernetes Service (AKS), Azure Oauth.
  • Developing Spark Jobs for Data lakes migration into the cloud.
  • Developing helm charts for Azure AKS Automation.
  • Refactoring design on application to be cloud native.
  • Onboarding internal customers to the Azure cloud.
  • Implementation of Spring boot Kafka Streams Applications.
  • Implementation of Argo workflow pipelines.
Mar 2020 - May 2020
3 months
Munich, Germany

Big Data Architect, Data Architect

BMW AG

  • Working with on AD- Vantage Program, on self-driving car data.
  • Environment: Mapr + Openshift cluster on premise (500+ nodes).
  • Technologies Used: Mapr cluster (Hadoop), Openshift, Elasticsearch + Kibana, Apache Airflow, Kafka Streams.
  • Developing Data pipelines using Spark and Airflow for self-driving cars.
  • Generating Metrics for Geospatial applications.
  • Ingesting Data into Elastic search using Apache Spark.
  • Functional Programming with Scala.
Jan 2020 - May 2020
5 months
Stuttgart, Germany

Big Data Developer

DXC

  • Create an Azure service for Inferencing at Scale.
  • Environment: Azure Cloud.
  • Automate Azure Kubernetes clusters deployment.
  • Create and deploy Deep learning Spark Jobs with pytorch + GPUs on Kubernetes.
  • Perform GPU Inferencing on TB’s of data.
Jun 2018 - Mar 2020
1 year 10 months
Stuttgart, Germany

Big Data Architect

Daimler AG

  • Working with R&D on data from cars to perform TensorFlow GPU trainings.
  • Environment: Multiple Mapr clusters (30+ nodes), NVIDIA GPUS (Tesla), Apache Mesos.
  • Developing Data pipelines using Airflow and Apache Spark.
  • Developing end to end monitoring based on Prometheus.
  • Developing real time data pipelines based on docker, Kafka and Python.
  • Deploying Apache Marathon with Mesos and GPUS.
  • Architecture for Migration from Mesos to Kubernetes.
  • Jenkins pipelines for building Docker images to be used.
  • Mesos on GPU clusters.
  • Several Infrastructure tasks done on ansible for High Availability.
Sep 2017 - Jun 2018
10 months
Nuremberg, Germany

Big Data Developer, Spark / Kafka Developer, Data Architect

GFK

  • In this project we are ingesting huge amounts of data via Kafka Into Accumulo. All the Hadoop environment is Kerberized.
  • Environment: Cloudera Hadoop.
  • Writing Kafka Connectors to ingest Data.
  • Kerberizing Applications to Hadoop / Kafka / Kafka Connect.
  • Creating statistics plans for RDF4J Query over Accumulo.
  • Creating Apache Nifi Workflows.
  • Introducing git flow Automation, Continuous Integration and Docker Automation.
  • Kafka Connect Setup with Kerberos on Google Kubernetes.
  • Writing Java Applications based on RDF (web semantics).
Apr 2017 - Sep 2017
6 months
Frankfurt, Germany

Big Data Architect

Deutsche Bahn

  • In this project I had the role Hadoop Architect, some of the tasks were sizing Hadoop Cluster and bringing internal clients to the shared platform and supporting the different Data pipelines flows. All tools were used with a Kerberized Hadoop Cluster.
  • Data Migration using Sqoop and Oozie.
  • Configuring Hadoop Cluster with Kerberos and Active Directory.
  • Implementing Data pipelines using Kylo, Apache Nifi and Talend.
  • Deploying Hortonworks Cloud Break into Amazon AWS.
  • Apache Storm Streaming implementations.
  • Supporting internal clients with streaming and data cleaning operations.
  • Hadoop Sizing for On Premise and on Amazon Cloud.
Oct 2016 - Mar 2017
6 months
Dresden, Germany

Big Data Developer and Architect

Kiwigrid

  • In this project the main goal is to integrate spark deeper into Hbase and Architecting new alerting and computing framework based on Spark Streaming. Every deployment is based on Docker.
  • Technologies Used: Apache Hbase with Phoenix jdbc, Apache Ambari / Hortonworks, Apache Spark, Scala and Java, Vertx Server, Docker, TimescaleDB.
  • Creating Reports in Spark Jobs over history data.
  • Custom Spark Data sources for Hbase and Aggregation for Data exploration.
Schweinfurt, Germany

SAP Administrator and Oracle Administrator

ZF Friedrichshafen AG and S.Oliver

  • Responsible for the Service availability from the SAP Systems on the company. We have more then 200 Systems to maintain.
  • Some of the activities that I have done was:
  • SAP and Oracle Upgrades.
  • SAP OS / HW Migration.
  • TREX Enterprise Search, ASCS Splits, SAP Security, SSO, SNC, SSFS.

Languages

German
Native
English
Advanced

Certifications & licenses

Databricks Foundation

Databricks Lakehouse Platform Accreditation

Confluent Certified Developer for Apache Kafka

Generative AI with Large Language Models (NLP)

CKAD: Certified Kubernetes Application Developer

Microsoft Certified: Azure Fundamentals

Data Engineering Nanodegree

Functional Programming Principles in Scala on Coursera

Big Data Analytics

Fraunhofer IAIS

Big Data Analytics by University of California, San Diego on Coursera

Databricks Developer Training for Apache Spark

Hadoop Platform and Application Framework by University of California on Coursera

Machine Learning with Big Data by University of California, San Diego on Coursera

SAP OS and DB Migration (TADM70)

SAP Database Administration I (Oracle) (ADM 505)

SAP Database Administration II (Oracle) (ADM 506)

SAP NetWeaver AS Implementation and Operation I (SAP TADM10)

SAP NetWeaver Portal - Implementation and Operation (TEP10)

ITL Foundation v4

Need a freelancer? Find your match in seconds.
Try FRATCH GPT
More actions