Serge (Dr.) Kalinin

Senior DevOps (extern)

Serge Kalinin
Munich, Germany

Experience

Apr 2022 - Present
2 years 11 months
Karlsruhe, Germany

Senior DevOps (extern)

Atrvivia AG

Development of Data Integration Hub platform (DIH) in scope of a Data Governance project. DIH is the central architecture element for sharing data between tenants. It is primarily based on data product descriptions (specifications), data catalogs and services used to represent shared data. A typical workflow has the following steps:

  • inject data product description via REST API or Swagger UI
  • metadata is written into Kafka topics
  • Kafka consumers read the data and take actions such as creation of metadata in Datahub, creation of tables in Trino, creation of predefined files structures on S3, creation of policies, etc.

The focus of my tasks was:

  • Implementation of single sign-on in services based on JWT tokens
  • Development of REST APIs
  • Development of integration tools between software components (SelfService, Trino, S3, Datahub, Great Expectations, etc)
  • Development of data quality validation services
  • Development of ETL pipelines
  • Onboaring of new customers
  • Development of monitoring systems
  • Troubleshooting and support of services and customers

The following software stack is used for development:

  • CI/CD: OpenShift (Kubernetes), Helm, Docker, Git, Tekton, ArgoCD
  • Data catalogs and data lineage: Datahub, OpenLineage; with integrations to Spark, Pandas. The services have been implemented in Python
  • SQL engines: Trino with Starburst Web UI, PostgreSQL, Hadoop, DB2, Delta Lake
  • Data quality: Great Expectations
  • REST API: Java, Swagger, Springboot
  • Authentication: JWT, OAuth2, Single Sign-On
  • Apache Ranger for management of access policies
  • Prometheus and Grafana for monitoring

Certification: AWS Certified Data Engineer - Associate

Oct 2021 - Apr 2023
1 year 7 months
Hamburg, Germany

Senior DevOps (extern)

Otto GmbH & Co KG

Design and implementation of data driven microservices for search engine (Google) optimization using AWS services. Services have mainly ETL patterns. A typical service gets data from a source (REST API, SQS, DynamoDB,etc), transforms it (e.g. calculates changes in a list with respect to previous days) and uploads results to a backend (S3, database).

Service I (MLOps). Assessment of OTTO pages by extracting related keywords that describe the content of the pages and matching them with searches on Google. Migration of data transformation, model training and retraining, and model deployment from GCP to AWS. Design and implementations of workflows:

  • Usage of Github actions as CI/CD pipelines
  • Usage of Terraform to manage cloud resources (creation of containers, load balancing of model instances, etc)
  • Implementation of models validations and testing with Python
  • Implementation of model monitoring with Grafana

Service S:

  • Millions of REST API calls within an hour using AsyncIO
  • Parsing nested JSON data and filtering it
  • Store results on S3

Technologies used:

  • Languages: Python, Java, TypeScript, Kotlin
  • Monitoring: CloudWatch, Grafana, Tableau
  • Databases: MongoDB, DynamoDB, PostgreSQL, Exasol
  • Message processing: SNS, SQS
  • Provisioning: Terraform, Pulumi, Serverless (CloudFoundation)
  • Containers: Docker, ECR, ECS
  • Unit tests: PyTest
Jul 2018 - Sep 2021
3 years 3 months
Cologne, Germany
Hybrid

Senior Big Data Consultant (extern)

REWE Systems GmbH

Conceptualization and implementation of a hybrid environments on Google Cloud Platform:

  • Provisioning of GCP infrastructure with Terraform and later with Ansible
  • Redundant connectivity and encryption of data between GCP and on-premise systems
  • Provisioning of MapR and Spark environments on GCP
  • Setups of realtime data replication from on-premise tables to GCP
  • Integration with services of REWE services (ActiveDirectory, DNS, Instana, etc)

Development of REST API for machine learning models using Flask

Implementation of persistent storage based on MapR for Kubernetes cluster

Operating of MapR clusters: upgrades, extensions, troubleshooting of services and applications

Synchronization of a Kafka cluster with MapR streams using Kafka connect

Design and implementation of ETL pipelines, synchonization and integration of MapR clusters with different data sources (e.g. DB2 and TerraData warehouses)

Onboarding of new internal REWE customers to MapR platforms

Consulting of management by technical topics and future developments in Big Data fields

Proposals for solutions of security topics (e.g. contrained delegation on F5 or authentication for OpenTSDB) and PoCs

Developer in data science projects:

  • Development of markets classification models
  • Visualization of data and predictions with Jupyter and Grafana
  • Integration with JIRA

3rd-level support

Sep 2016 - May 2018
1 year 9 months
Munich, Germany

Senior Big Data Architect

Allianz Technology SE

Management of large-scale, multi-tenant and secure, highly-available Hadoop infrastructure supporting rapid data growth for a wide spectrum of innovative customers

Pre-sales: onboarding of new customers

Providing architectural guidance, planning, estimating cluster capacity, and creating roadmaps for Hadoop cluster deployments

Design, implemention and maintainance of enterprise-level security Hadoop environments (Kerberos, LDAP/AD, Sentry, encryption-in-motion, encryption-at-rest)

Install and configuration of Hadoop multi-tenant environments, updates, patches, version upgrades

Creating run books for troubleshooting, cluster recovery and routine cluster maintenance

Troubleshooting Hadoop-related applications, components and infrastructure issues at large scale

3rd-Level-Support (DevOps) for business-critical applications and use cases

Evaluation and proposals of new tools and technologies to meet the needs of the global organization (Allianz Group)

Work closely with infrastructure, network, database, application, business intelligence and data science units

Developer in Fraud Detection projects including machine learning

Design and setup of a Microsoft Revolution (Microsoft R Open) data science model training platform on Microsoft Azure and on premise for Fraud Detection using Docker and Terraform

Developer in Supply Chain Analytics projects (e.g. GraphServer that allows to execute graph queries on data stored on HDFS)

Transformation of team's internal processes according to Agile/SCRUM framework

Developer of Kafka-based use cases:

ClickStream:

  • Producer: aggregator for streamed URLs clicked on webpages with a REST API or other sources (e.g. Oracle)
  • Consumer: Flink job that after pre-processing (sanity check, extraction of time information) put data on HDFS in XML files
  • Used stack: Java, Kafka, Cloudera, SASL, TLS/SSL, Sentry, YARN, Flink, Cassandra

Classification of documents:

  • Producer: custom written producer that reads documents from a shared file system and writes them into Kafka
  • Consumer: Spark streaming job that after pre-processing sends documents to UIMA platform for classification of documents. After classification data will be stored on HDFS for further batch processing
  • Used stack: Java, Kafka, Spark (streaming), Cloudera, SASL, TLS/SSL, Sentry, YARN, UIMA

Graph database (PoC): manage graphs via Kafka interface:

  • Producer: twitter, news agents sites, etc
  • Consumer: converted articles and messages into graph queries and executed them on graphs using Gremlin
  • Used stack: Java, Python, Kafka, Cassandra, Germlin, Keylines (for visualization of graphs; JavaScript), Google Cloud
Jun 2014 - Jul 2016
2 years 2 months
Berlin, Germany

System Architect Web Operations

The unbelievable Machine Company GmbH

Sep 2012 - Jun 2014
1 year 10 months
Cologne, Germany

System Operations

Werkenntwen GmbH

Jan 2009 - Sep 2012
3 years 9 months
Wuppertal, Germany

Postdoc

Bergische Universität Wuppertal

Oct 2006 - Dec 2009
2 years 3 months
Aachen, Germany

Postdoc

Rheinisch-Westfälische Technische Hochschule

Languages

Russian
Native
English
Intermediate
French
Intermediate
German
Intermediate

Education

Jan 2001 - Sep 2006

Université catholique de Louvain

Ottignies-Louvain-la-Neuve, Belgium

Sep 1998 - Jun 2000

Moscow Institute of Physics and Technology

M.S. · High energy physics · Dolgoprudny, Russian Federation

Sep 1994 - Jun 1998

Moscow Institute of Physics and Technology

B.S. · High energy physics · Dolgoprudny, Russian Federation

Certifications & licenses

AWS Certified Data Engineer - Associate