Jan Krol
Data Expert
Experience
Data Expert
Manufacturing
Data Expert
Intralogistics
- Provided consulting and implementation of AWS infrastructure to support global process operations in Transport & Logistics
- Provisioned and operated servers, OS environments, and databases in AWS
- Identified and presented optimization potentials in commercial and technical terms
- Administered and maintained provided systems
- Developed maintenance and monitoring concepts
- Advised development projects on system use, configuration, and optimization
- Consulted on architectures and operational concepts using AWS Cloud
- Trained internal employees on new AWS services and working methods
Services: AWS Glue, Redshift, EMR, SageMaker, Python
Data Expert
Logistics
- Developed and implemented a standardized big data architecture for group-wide platform services in the Transport & Logistics sector on Azure
- Automated solutions using Infrastructure as Code (Terraform, Ansible)
- Presented and discussed sub-project architectures on Azure
- Implemented real-time data streaming with Apache Kafka and monitoring solutions
- Advised on Azure platform strategy and reference architectures
- Developed mechanisms for proactive elimination of vulnerabilities in Azure and Kubernetes clusters
- Conceptualized container orchestration platforms with Kubernetes CI/CD
- Created user and authorization concepts according to group specifications
- Managed operational services within an agile team
Services: Azure Purview, Azure Synapse Analytics, Azure Data Factory, Azure Databricks, Terraform, GitLab Runner, Azure DevOps
Data Expert
E-Commerce
- Strategically developed and migrated analytics data pipelines into a Data Lakehouse architecture on AWS
- Enhanced the Big Data Lake environment and ensured stringent data quality and GDPR compliance
- Performed exploratory analysis and algorithm development through data provisioning and preparation (AWS Glue, Spark, Lambda)
- Developed ETL jobs and data pipelines to provide ready-to-consume data sources (AWS Glue, Redshift, Spark, PySpark)
- Conducted regression testing and quality checks in data pipelines and the data lake
- Implemented high-performance streaming data processing with Kinesis, Kafka, and Lambda
- Orchestrated and connected multiple data sources
- Automated deployments using DevOps best practices (CodeBuild, CodePipeline, GitHub Actions)
- Built infrastructure with IaC (AWS CDK)
- Monitored data quality, compliance, and costs
Services: AWS Glue, Kinesis, Kafka, Apache Spark, Data Catalog, S3, Athena, Redshift, Lambda, ECS, Step Functions
Data Expert
E-Commerce
- Guided internal e-commerce product teams in developing, implementing, and maintaining high-performance data processing and integration systems
- Migrated existing data services, pipelines, and assets to a new event-based serverless architecture
- Developed and executed Lambda functions and PySpark jobs
- Designed architecture and integration with Kafka for real-time processing and analysis of event data
- Implemented PySpark transformations, filtering, and aggregations
- Ensured efficient and reliable connection with Kafka, configured security settings, and integrated with other components
- Established extensive testing and monitoring mechanisms
- Delivered a high-performance, scalable event system enabling data-driven decision-making
Services: AWS Glue, Apache Spark, Data Catalog, S3, Athena, Redshift, Lambda, ECS, Step Functions
Data Expert
Transport & Logistics
- Integrated logistics data streams with Event Hub and Kafka using PySpark Structured Streaming
- Designed and implemented a pipeline for capturing, processing, and forwarding data streams
- Utilized PySpark Structured Streaming for efficient real-time data processing
- Configured and initialized PySpark streaming jobs and defined necessary data structures
- Conducted comprehensive testing and monitoring to ensure smooth data transmission and high data quality
- Enabled robust and efficient integration of logistics data streams with Event Hubs
- Delivered real-time utilization of logistics data for analysis and further processing
Services: Azure Synapse Analytics, Purview Data Catalog, Apache Spark, Event Hub, Structured Streaming, GraphFrame, Azure Storage v2, Power BI
Data Expert
Transport & Logistics
- Spearheaded development of a robust data strategy and governance framework to streamline and enhance data handling capabilities
- Constructed a sophisticated data management platform on Databricks
- Designed and implemented an efficient data hub ingestion platform
- Led the design and establishment of an organization-wide data strategy aligned with business goals
- Developed a comprehensive data governance framework ensuring data accuracy, privacy, and compliance
- Oversaw deployment and customization of the data management platform on Databricks
- Enhanced data processing, analysis, and reporting capabilities with Power BI
- Engineered a robust data hub with advanced ingestion pipelines based on AWS EventBridge
- Optimized data flow from diverse sources to centralized storage systems (Data Lake House on Azure)
- Collaborated with cross-functional teams to integrate the data management platform with existing IT infrastructure
- Conducted training sessions and workshops to foster a data-driven culture and enhance data literacy
Services: Azure Databricks, Databricks Data Catalog, AWS EventBridge, Kinesis, Event Hub, Structured Streaming, Apache Spark
Data Expert
Transport & Logistics
- Served as the technical lead managing a team of 3 offshore developers while implementing scalable and robust data solutions in Azure Databricks
- Introduced Databricks Live Tables for schema and table management
- Implemented Databricks Asset Bundle following an Infrastructure as Code mindset
- Designed and refined the medallion data architecture to optimize data processing workflows
- Collaborated closely with multiple business units to ensure data solutions met their specific requirements
- Established coding standards and best practices for the development team
- Conducted code reviews and provided technical guidance
- Facilitated knowledge transfer and technical upskilling sessions
- Developed scalable ETL pipelines in Azure Databricks
- Created optimized data storage solutions with future scalability in mind
- Established a complete IaC workflow for data platform components
- Integrated version control and CI/CD for Databricks Asset Bundles
- Automated deployment of table schemas, jobs, and notebooks
- Implemented environment promotion strategies (Dev/Test/Prod)
- Managed configuration for cross-environment consistency
Services: Azure Databricks, Databricks Live Tables, Databricks Asset Bundle, Azure Data Factory, Delta Lake, Spark SQL, Azure Key Vault, Azure Storage, Power BI
Summary
Big Data Specialist Focus: Big Data, Cloud Architecture, Data Management Platforms
Skills
Big Data Platform Specialist With Focus On Amazon Web Services & Microsoft Azure
Etl Processes/pipelines & Data Engineering
Architecture Of Data Management Plaftorm In Enterprises
Build Up Of Data Lakes & Data Lakehouses
Application Migrations Using Cloud Services
Consulting & Implementation Of Automation Concepts Especially Devops
Integration Of Active Directory, Security Concepts And Compliance Requirements
Monitoring And Logging
Confident In Python, Sql, Typescript, Golang
Big Data Cloud Architecture (Aws & Microsoft Azure)
Data Engineering (Databricks, Synapse Analytics, Fabric, Apache Spark, Aws Glue, Athena, Redshift & Emr)
Infrastructure As Code (Terraform, Pulumi, Aws Cdk, Arm)
Languages
Certifications & licenses
AWS Business Professional
AWS Certified Cloud Practitioner
AWS Certified Machine Learning – Specialty
AWS Certified Solutions Architect – Associate
AWS Technical Professional
Azure Solutions Architect Expert: AZ-300: Microsoft Azure Architect Technologies AZ-301: Microsoft Azure Architect Design
Databricks Certified Associate Developer For Apache Spark 3.0
HashiCorp Certified: Terraform Associate
Similar Freelancers
Discover other experts with similar qualifications and experience