Jorge M. - Data Architect

Go to Website

Würzburg, Germany

Experience

Mar 2025 - Present

9 months

Data Architect

Deutsche Bahn

I was responsible for the architecture of the new data processing system for the Finance division of DB on the cloud.
Used DBT, Dagster, Kubernetes, Glue, and a Databricks proof of concept.
Environment: Kubernetes, AWS.
Tools: DBT, Dagster, Redshift, AWS, Glue, S3, VS Code, Databricks.
Designed and provided best practices for data modeling in dbt.
Handled slowly changing dimensions, late-arriving data, and testing.
Designed the ingestion flow from other systems into S3 and Redshift.
Designed and implemented new partitions for Dagster and incremental loading with DBT.
Mapped business requirements to technical architectures.
Trained and guided junior developers.

Sep 2024 - Mar 2025

7 months

Data Architect expert

SAP AG

As a Data Architect, I designed and implemented a cloud-agnostic architecture to support Kafka Tiered Storage across multiple environments. This involved defining rollout workflows, automation pipelines, and infrastructure abstraction to ensure scalable, cost-effective, and maintainable data streaming capabilities.
Scaled a cluster to handle over 25,000 Kafka partitions.
Environment: Kubernetes, Azure, AWS, Google Cloud.
Tools: Kubernetes, Gardener, GitHub, Python, Go, Kafka, Jenkins, Helm.
Led the architectural design and implementation of Kafka Tiered Storage rollout across 30+ Kubernetes clusters in multi-cloud environments (Azure, AWS, GCP).
Defined and implemented infrastructure provisioning using Crossplane, enabling declarative and consistent infrastructure deployment across cloud providers.
Developed a custom Go-based Kafka Operator, abstracting platform complexity and standardizing the tiered storage activation process for data pipelines.
Designed and automated GitOps-based deployment strategies using Flux and Helm, ensuring safe and repeatable rollout procedures.
Optimized Gardener shoot configurations to align cluster resources with Kafka workload and cost efficiency requirements.

May 2024 - Nov 2024

7 months

Data Architect expert

s.Oliver GmbH

As the lead Data Architect, I was responsible for the end-to-end migration from SAP HANA to a modern, scalable Azure-based Databricks Lakehouse. This initiative involved redesigning the data architecture, implementing robust ETL pipelines, and introducing advanced analytics and AI capabilities — resulting in annual cost savings of over €50,000 by decommissioning legacy SAP infrastructure.
Environment: Databricks / Azure.
Tools & Technologies: Databricks, Azure Data Lake, SAP HANA, PySpark, DBT, Kafka, Azure DevOps, Delta Lake, Python, FP-Growth, Time Series Forecasting.
Designed a medallion architecture on Databricks to support scalable and modular data ingestion, transformation, and consumption.
Led the implementation of incremental ETL pipelines using PySpark to extract and process SAP data efficiently.
Architected and implemented DBT-based semantic layers, including dimensional modeling for fact and dimension tables.
Established Dev to Prod CI/CD pipelines to standardize deployment and enforce governance.
Defined role-based access control and security concepts aligned with enterprise Azure standards.
Enabled real-time data integration by connecting Kafka streams to Databricks, enriching analytical capabilities.
Introduced AI/ML use cases, including FP-Growth for basket analysis and time series forecasting models.
Mentored junior developers on Databricks best practices, ensuring long-term platform adoption and team scaling.

Jan 2023 - Aug 2023

8 months

Data Architect expert

ias Gruppe

As a Data Architect, I was responsible for designing and implementing a modern, scalable Azure-based Data Lake architecture to support real-time data ingestion and processing from IoT and telemetry sources. The platform was built to provide structured, analytics-ready data for multiple departments, supporting both operational dashboards and advanced analytics.
Environment: Azure.
Tools & Technologies: Azure Synapse, Delta Lake, Azure Data Lake Gen2, Azure IoT Hub, Azure Event Hub, Azure Service Bus, Azure Data Factory, DBT, Airbyte, Power BI, Azure Monitor, Log Analytics, Python, SQL.
Architected an end-to-end Azure Data Lakehouse solution leveraging Azure Synapse, Delta Lake, and Azure Data Lake Storage Gen2, ensuring scalable and performant storage and query capabilities.
Designed and implemented streaming ingestion pipelines using Azure IoT Hub, Azure Event Hub, and Azure Service Bus, enabling real-time telemetry data capture from thousands of IoT devices.
Developed data integration and transformation flows using Airbyte for ELT and DBT for business logic modeling, dimensional design, and lineage tracking.
Orchestrated complex data workflows using Azure Data Factory, integrating batch and streaming processes into a unified data pipeline.
Implemented Delta Lake-based time travel and ACID transactions to ensure reliability and traceability of business-critical data.
Designed role-based access control (RBAC), resource tagging strategies, and monitoring with Azure Monitor and Log Analytics, ensuring operational transparency and data security.
Enabled Power BI integration for stakeholders to explore data in near real-time and develop business dashboards.
Collaborated with product and operations teams to capture functional requirements and translate them into scalable data architecture patterns.

Sep 2022 - May 2024

1 year 9 months

Frankfurt, Germany

Data Architect expert

Deutsche Bahn

As Data Architect, I led the development of a large-scale, cloud-native data platform on AWS for processing streaming and batch data in the transportation domain. The architecture enabled real-time analytics and delta ingestion into a multi-hundred-terabyte data lake, optimizing operations around train delays, departures, and predictive insights.
Environment: AWS.
Tools & Technologies: AWS Kinesis, TimescaleDB, AWS Glue, Apache Hudi, Lambda, S3, DBT, PostgreSQL, CDK, GitLab, Spark, Athena, CloudWatch.
Designed and implemented real-time streaming architectures using AWS Kinesis, Lambda, and Apache Spark to support time-sensitive analytics use cases.
Architected delta ingestion pipelines on AWS Glue and Apache Hudi, enabling efficient small file compaction and time travel analytics at scale.
Delivered business-critical KPIs and dashboards, with end-to-end data lineage and auditability across S3, PostgreSQL, and CloudWatch.
Defined and enforced infrastructure-as-code (IaC) principles using AWS CDK, enabling scalable and replicable environments.
Introduced and rolled out DBT for semantic modeling and reusable business logic, integrating into CI/CD workflows with GitLab.
Conducted architectural evaluations of Databricks, Snowflake, and AWS Athena, providing decision support for future platform strategy.
Mentored a team of developers, optimizing development cycles and ensuring best practices in cloud data engineering.
Implemented IoT 4.0 pipelines for ingesting telemetry data and supporting predictive analytics initiatives.

Sep 2021 - Sep 2022

1 year 1 month

Rottendorf, Germany

Kafka Expert

S.Oliver GmbH

In this project we redesign the complete purchase Orders, Material chain to not be Batched but to go for real time with the usage of kafka.
Environment: Confluent Cloud and Azure.
Spring boot Kafka Streams Applications.
Developing custom Kafka source connectors to extract data from SAP Systems.
Developing custom Kafka sink connectors to write to SAP Systems.
Deploying Kafka connect connectors with monitoring into Azure Kubernetes cluster.
Developing Data pipelines using Airflow and Azure Cloud.
Developing the Architecture for the Data Pipelines between on premise and Azure Cloud.
Writing Spark jobs to clean and aggregate data.

Mar 2021 - Jun 2021

4 months

Würzburg, Germany

Datawarehouse expert

Büro Forum GmbH

Develop a Datawarehouse for the Concept office ERP System.
Environment: Google Big Query and DBT.
Developing dbt tool workflows and Stars Schemas for the Datawarehouse.
Developing ELT Workflows with Stich data.
Developing dashboards with PowerBI in Azure cloud.

Feb 2021 - Aug 2022

1 year 7 months

Software Developer

RTL Deutschland

In this project, I architected and delivered a highly complex and compliance-driven data sharing platform on Microsoft Azure. The solution enabled secure, governed, and scalable access to sensitive business data across departments and partners, supporting both analytical and operational use cases.
Environment: Microsoft Azure.
Tools & Technologies: Azure Databricks, Azure Synapse, Delta Lake, Azure App Services, DBT, FastAPI, PySpark, Power BI, Azure DevOps, Azure Monitor, Azure Key Vault, Python, SQL.
Designed and implemented a Lakehouse architecture combining Azure Databricks, Delta Lake, and Azure Synapse to support both batch and real-time workloads with ACID compliance and scalable performance.
Built RESTful data APIs using FastAPI and deployed them securely via Azure App Services, providing a controlled access layer to the data platform.
Developed incremental ETL pipelines using PySpark and DBT, implementing star schema models for semantic consistency, historical tracking, and governed self-service analytics.
Enabled interactive reporting and visual analytics using Power BI, directly integrated into the Azure ecosystem for performance and security compliance.
Implemented strong data access controls, audit logging, and resource monitoring to ensure compliance with GDPR and internal data governance policies.
Established automated deployment processes and CI/CD pipelines for data infrastructure components using Azure-native tooling.

Sep 2020 - Jun 2021

10 months

Munich, Germany

Cloud Solution Architect

Allianz Technology

Migration of Datalake’s into the Azure Cloud. High level of automation by means on ArgoCD, Jenkins, helmcharts and Terraform. Designing clients applications to be “cloud” native. Spark and azcopy was used to perform some parts of the migration.
Environment: Azure Cloud.
Technologies Used: Azure Blob Storage, Azure Kubernetes Service (AKS), Azure Oauth.
Developing Spark Jobs for Data lakes migration into the cloud.
Developing helm charts for Azure AKS Automation.
Refactoring design on application to be cloud native.
Onboarding internal customers to the Azure cloud.
Implementation of Spring boot Kafka Streams Applications.
Implementation of Argo workflow pipelines.

Mar 2020 - May 2020

3 months

Munich, Germany

Big Data Architect, Data Architect

BMW AG

Working with on AD- Vantage Program, on self-driving car data.
Environment: Mapr + Openshift cluster on premise (500+ nodes).
Technologies Used: Mapr cluster (Hadoop), Openshift, Elasticsearch + Kibana, Apache Airflow, Kafka Streams.
Developing Data pipelines using Spark and Airflow for self-driving cars.
Generating Metrics for Geospatial applications.
Ingesting Data into Elastic search using Apache Spark.
Functional Programming with Scala.

Jan 2020 - May 2020

5 months

Stuttgart, Germany

Big Data Developer

DXC

Create an Azure service for Inferencing at Scale.
Environment: Azure Cloud.
Automate Azure Kubernetes clusters deployment.
Create and deploy Deep learning Spark Jobs with pytorch + GPUs on Kubernetes.
Perform GPU Inferencing on TB’s of data.

Jun 2018 - Mar 2020

1 year 10 months

Stuttgart, Germany

Big Data Architect

Daimler AG

Working with R&D on data from cars to perform TensorFlow GPU trainings.
Environment: Multiple Mapr clusters (30+ nodes), NVIDIA GPUS (Tesla), Apache Mesos.
Developing Data pipelines using Airflow and Apache Spark.
Developing end to end monitoring based on Prometheus.
Developing real time data pipelines based on docker, Kafka and Python.
Deploying Apache Marathon with Mesos and GPUS.
Architecture for Migration from Mesos to Kubernetes.
Jenkins pipelines for building Docker images to be used.
Mesos on GPU clusters.
Several Infrastructure tasks done on ansible for High Availability.

Sep 2017 - Jun 2018

10 months

Nuremberg, Germany

Big Data Developer, Spark / Kafka Developer, Data Architect

GFK

In this project we are ingesting huge amounts of data via Kafka Into Accumulo. All the Hadoop environment is Kerberized.
Environment: Cloudera Hadoop.
Writing Kafka Connectors to ingest Data.
Kerberizing Applications to Hadoop / Kafka / Kafka Connect.
Creating statistics plans for RDF4J Query over Accumulo.
Creating Apache Nifi Workflows.
Introducing git flow Automation, Continuous Integration and Docker Automation.
Kafka Connect Setup with Kerberos on Google Kubernetes.
Writing Java Applications based on RDF (web semantics).

Apr 2017 - Sep 2017

6 months

Frankfurt, Germany

Big Data Architect

Deutsche Bahn

In this project I had the role Hadoop Architect, some of the tasks were sizing Hadoop Cluster and bringing internal clients to the shared platform and supporting the different Data pipelines flows. All tools were used with a Kerberized Hadoop Cluster.
Data Migration using Sqoop and Oozie.
Configuring Hadoop Cluster with Kerberos and Active Directory.
Implementing Data pipelines using Kylo, Apache Nifi and Talend.
Deploying Hortonworks Cloud Break into Amazon AWS.
Apache Storm Streaming implementations.
Supporting internal clients with streaming and data cleaning operations.
Hadoop Sizing for On Premise and on Amazon Cloud.

Oct 2016 - Mar 2017

6 months

Dresden, Germany

Big Data Developer and Architect

Kiwigrid

In this project the main goal is to integrate spark deeper into Hbase and Architecting new alerting and computing framework based on Spark Streaming. Every deployment is based on Docker.
Technologies Used: Apache Hbase with Phoenix jdbc, Apache Ambari / Hortonworks, Apache Spark, Scala and Java, Vertx Server, Docker, TimescaleDB.
Creating Reports in Spark Jobs over history data.
Custom Spark Data sources for Hbase and Aggregation for Data exploration.

Schweinfurt, Germany

SAP Administrator and Oracle Administrator

ZF Friedrichshafen AG and S.Oliver

Responsible for the Service availability from the SAP Systems on the company. We have more then 200 Systems to maintain.
Some of the activities that I have done was:
SAP and Oracle Upgrades.
SAP OS / HW Migration.
TREX Enterprise Search, ASCS Splits, SAP Security, SSO, SNC, SSFS.

Languages

German

Native

English

Advanced

Certifications & licenses

Databricks Foundation

Databricks Lakehouse Platform Accreditation

Confluent Certified Developer for Apache Kafka

Generative AI with Large Language Models (NLP)

CKAD: Certified Kubernetes Application Developer

Microsoft Certified: Azure Fundamentals

Data Engineering Nanodegree

Functional Programming Principles in Scala on Coursera

Big Data Analytics

Fraunhofer IAIS

Big Data Analytics by University of California, San Diego on Coursera

Databricks Developer Training for Apache Spark

Hadoop Platform and Application Framework by University of California on Coursera

Machine Learning with Big Data by University of California, San Diego on Coursera

SAP OS and DB Migration (TADM70)

SAP Database Administration I (Oracle) (ADM 505)

SAP Database Administration II (Oracle) (ADM 506)

SAP NetWeaver AS Implementation and Operation I (SAP TADM10)

SAP NetWeaver Portal - Implementation and Operation (TEP10)

ITL Foundation v4

Need a freelancer? Find your match in seconds.

Try FRATCH GPT

Similar Freelancers

Discover other experts with similar qualifications and experience

Stephan S.

Development of a comprehensive data strategy and governance framework for a data management platform on Databricks

View Profile

Martin M.

Freelance Data Architect

View Profile

Bogdan S.

Data Engineer / DWH Designer / Data Architect

View Profile

Matthias H.

Technical Product Owner – AI & Data Platform on AWS

View Profile

Tarek S.

Cloud Analytics Developer

View Profile

Eduard V.

Tech Lead Customer Base Documentation Automation

View Profile

Experience

Data Architect

Deutsche Bahn

Data Architect expert

SAP AG

Data Architect expert

s.Oliver GmbH

Data Architect expert

ias Gruppe

Data Architect expert

Deutsche Bahn

Kafka Expert

S.Oliver GmbH

Datawarehouse expert

Büro Forum GmbH

Software Developer

RTL Deutschland

Cloud Solution Architect

Allianz Technology

Big Data Architect, Data Architect

BMW AG

Big Data Developer

DXC

Big Data Architect

Daimler AG

Big Data Developer, Spark / Kafka Developer, Data Architect

GFK

Big Data Architect

Deutsche Bahn

Big Data Developer and Architect

Kiwigrid

SAP Administrator and Oracle Administrator

ZF Friedrichshafen AG and S.Oliver

Languages

Certifications & licenses

Databricks Foundation

Databricks Lakehouse Platform Accreditation

Confluent Certified Developer for Apache Kafka

Generative AI with Large Language Models (NLP)

CKAD: Certified Kubernetes Application Developer

Microsoft Certified: Azure Fundamentals

Data Engineering Nanodegree

Functional Programming Principles in Scala on Coursera

Big Data Analytics

Fraunhofer IAIS

Big Data Analytics by University of California, San Diego on Coursera

Databricks Developer Training for Apache Spark

Hadoop Platform and Application Framework by University of California on Coursera

Machine Learning with Big Data by University of California, San Diego on Coursera

SAP OS and DB Migration (TADM70)

SAP Database Administration I (Oracle) (ADM 505)

SAP Database Administration II (Oracle) (ADM 506)

SAP NetWeaver AS Implementation and Operation I (SAP TADM10)

SAP NetWeaver Portal - Implementation and Operation (TEP10)

ITL Foundation v4

Similar Freelancers

Senior Data/ML Consultant & Technical Lead

Senior DevOps (external)

Data Engineer

Data Scientist & AI Engineer & AI Architect

Senior Data Architect & Data Engineer

AR/VR/XR Architect

Senior Data Engineer

Data Solution Architect, Founder

Senior Cloud / Data Engineer / Architect

Development of a comprehensive data strategy and governance framework for a data management platform on Databricks

Freelance Data Architect

Data Engineer / DWH Designer / Data Architect

Technical Product Owner – AI & Data Platform on AWS

Cloud Analytics Developer

Tech Lead Customer Base Documentation Automation