Amit P. - Senior AWS Data Engineer

Go to Website

Bokaro Steel City, India

Experience

Jan 2023 - Feb 2025

2 years 2 months

Bengaluru, India

Senior AWS Data Engineer

Keeno Technologies

Developed Python scripts to automate processes, perform data analysis, consume streaming APIs, process data streams using pandas DataFrame, and stage data for aggregation, cleansing, and building data marts
Implemented analytic predictions on machine learning data, data visualization, and business logic integration
Created ELT pipelines with a visual editor in DynamoDB and Kinesis, manipulated statistics using Python and Spark Streaming
Used AWS Lambda with Snowflake engine in ECR templates, built AWS Glue transformations for Redshift Spectrum moving data from S3 and external sources
Enabled data scientists to leverage GCP and Azure Data Lake pipelines for research and experimentation
Built automated Glue templates and Lambda scripts on EC2 for batch data streaming platforms for global partners
Separated data transfer files from S3, enabling ML and BI components for research and analysis
Environment: Python, Spark, AWS Glue, S3, Databricks, Kinesis, Lambda, CloudFormation, DynamoDB, CodePipeline, CodeBuild, Step Functions, Athena, Snowflake, Autosys, Airflow, NiFi, Glue DataBrew

Oct 2021 - Dec 2022

1 year 3 months

United States

Analytical Data Engineer

Brillio

Analyzed multiple source systems and extracted data using Apache Spark on Databricks
Transformed and loaded data to S3, built ELT pipelines for clients like UMG, Realtor, KFC, McD, and investment partners
Built AWS Glue transformations for Redshift Spectrum and reverse pipelines, enabling data scientists to leverage GCP environments
Coordinated with BI teams to provide reporting data, designed and developed complex data pipelines, and wrote production code for logging and querying
Constructed ETL and ELT pipelines with productivity and data quality checks
Built automated Glue templates and Lambda scripts on EC2 and RDS for batch data streaming platforms
Exported data catalog, CloudWatch metrics, Step Functions workflows, and versioned code with GitHub and GitLab
Environment: Python, Spark, AWS Glue, S3, Lambda, CloudFormation, DynamoDB, CodePipeline, CodeBuild, Pytest, Step Functions, Athena, Snowflake, Autosys, Shell Scripting

Jul 2018 - Sep 2021

3 years 3 months

Bengaluru, India

Senior Data Engineer

Enum Informatics Private Ltd

Extracted data from SQL and Oracle sources and bulk-loaded into AWS S3
Built ETL pipelines for retail clients on big data architecture, migrated metadata and Glue schemas into business layer
Used AWS Glue for transformations, scalable data load into processed layer on data lake, exposed data via Athena views
Coordinated with BI teams for reporting and analysis, designed models and complex data pipelines, wrote production code in Visual Studio Code
Constructed ETL workflows with productivity and data quality checks
Technologies: Python, Spark, AWS Glue, S3, Athena, KMS, RDS

Jul 2017 - Jun 2018

1 year

Bengaluru, India

Senior Data Engineer

KPIT

Extracted data from SQL sources and bulk-loaded into AWS S3
Migrated metadata and Glue schemas into business layer and used AWS Glue for transformations and data load into processed layer
Exposed processed data via Athena views
Coordinated with BI teams to deliver reporting data, designed models and complex data pipelines
Technologies: Python, Spark, AWS Glue, S3, Athena

Jun 2016 - Jun 2017

1 year 1 month

Oakland, United States

Senior Data Engineer

Kaiser Permanente

Designed and implemented scalable big data solutions with Hadoop ecosystem tools: Hive, MongoDB, Spark Streaming
Engineered real-time data pipelines using Kafka and Spark Streaming, stored data in Parquet on HDFS
Implemented data transformations with Pig, Hive scripts, Sqoop, and Java MapReduce jobs
Integrated analytics using Apache NiFi and Neo4J, applied Agile methodologies with daily scrums and sprint planning
Architected data solutions leveraging AWS Glue, S3, Redshift, and Athena for real-time analytics
Developed and optimized AWS Glue jobs for ETL, implemented data cataloging and metadata management
Reduced ETL execution time by 35% and processing costs by 20%
Mentored junior engineers on AWS Glue best practices
Built ELT pipelines with Airflow, Python, dbt, Stitch, and GCP solutions and guided analysts on dbt modeling and incremental views
Managed ETL processes with AWS Glue, Lambda, Kinesis, and Snowflake using dbt and Matillion
Utilized AWS Glue DataBrew for visual data preparation and self-service wrangling
Worked on MongoDB CRUD, indexing, replication, and sharding
Extensive experience with Apache Airflow and scripting for scheduling and automation
Designed Wherescape RED data flows and mappings, implemented Azure Data Factory and Databricks solutions
Built real-time log pipelines with Cribl, extracted feeds with Kafka and Spark Streaming, wrote Hive and Sqoop jobs on petabyte data
Implemented Apache NiFi topologies, MapReduce jobs, Oozie workflows, and applied Agile/DataOps
Technologies: HIPAA, Hadoop, Hive, Sqoop, Pig, Java, NiFi, MongoDB, Python, Scala, Spark, Oozie, HBase, Cassandra, Trifacta

Oct 2014 - May 2016

1 year 8 months

Atlanta, United States

Senior Data Engineer

The Home Depot

Implemented CI/CD processes with GitLab, Python, and Shell scripting for automation
Developed AWS Lambda functions for nested JSON processing and constructed scalable AWS data pipelines with VPC, EC2, S3, ASG, EBS, Snowflake, IAM, CloudFormation, Route 53, CloudWatch, CloudFront, CloudTrail
Configured ELBs and Auto Scaling for fault tolerance and cost efficiency
Managed metadata and lineage in AWS Data Lake using Lambda and Glue
Integrated Hadoop jobs with Autosys and developed sessionization algorithms for website analytics
Developed RESTful and SOAP APIs with Swagger and tested with Postman
Led data migration projects using HVR, StreamSets, and Oracle GoldenGate for real-time replication
Managed ETL with Informatica PowerCenter and built StreamSets pipelines
Configured AWS DMS and designed AWS API Gateway and Lambda integrations with Snowflake and DynamoDB
Built ETL pipelines from S3 to DynamoDB and Snowflake, performed data format conversions
Used Trifacta for data wrangling and modeled data with star and snowflake schemas, SCD
Created ML POCs, Sqoop imports to HDFS, Hive tables, and Spark applications in Scala
Supported SIT, UAT, and production
Technologies: Hadoop, Hive, Zookeeper, MapR, Teradata, Spark, Kafka, NiFi, MongoDB, Python, AWS, Scala, Oozie

Feb 2012 - Sep 2014

2 years 8 months

Peoria, United States

Data Engineer

Caterpillar

Designed and implemented end-to-end data pipelines on GCP and AWS using Airflow, Docker, and Kubernetes
Built ETL/ELT processes for GCP data ingestion and transformation, deployed cloud functions to load CSVs into BigQuery
Developed Informatica PowerExchange and Data Quality solutions, improving data accuracy by 50%
Processed Google Pub/Sub data to BigQuery with Dataflow and Python
Performed data analysis, migration, cleansing, and integration with Python and PL/SQL
Developed logistic regression models and near real-time Spark pipelines
Implemented Apache Airflow for pipeline orchestration
Technologies: GCP (BigQuery, Cloud Functions, Dataflow, Pub/Sub), AWS, Airflow, Python, Spark, SQL, Docker, Kubernetes, Pandas, NumPy, Scikit-learn

Summary

Holding 13.1 years of experience in the field of Data Engineering.

Database development including Design architecture, development, system integration and infrastructure readiness, development, implementation, maintenance, and support with experience in Cloud platforms like AWS, and Microsoft Security and Azure Data Factory.

Worked on functionalities of project upgrade and migration in modern -tool API.

Expert in understanding the data and designing and implementing enterprise platforms like Data Lake and Data warehouses.

Years of experience in Databricks along with AWS and GCP framework tools in AWS Glue Studio, Athena, and Spark cluster.

Good understanding of relational databases and hands-on working experience in writing applications on databases with performance tuning and optimization of view on modern on premises tool framework knowledge.

Extensive experience working on AWS EMR Clusters and building optimized Glue Jobs as per business requirements.

Developed Spark applications with the help of APIs Spark SQL, data frame, and dataset with API Gateway.

Creating a Glue job or reference implementation for de-identifying PHI columns using glue data.

The objective is to provide a worked-through reference implementation of PHI de identification for Data Operations to provide de-identified data for integration.

PHI De-identification guide of Glue databrew recipes or jobs that de-identify a large sample for reference implementation for an identified integration client.

Definition of Databrew recipe stored in git. Data ingested in the main HAP-DEV stack according to integration for the reference implementation solution to read from de-identified bucked and write to the proper ingestion location in ingress for the client type data script is required for Dbt models to run as expected for the given above integration.

Review and accept for reference implementation by DataOps.

Documentation of reference implementation and PHI guidelines in the ADO wiki. Sample client integration identified for the reference implementation. It has hia-hoc in the title. The AWS bucket is hia-hoc-ingress partitioned in AWS node helpful to migrate in Databricks data warehouse as a solution to build, train, and data business to build the training dataset providing spark cluster environment to analyze the amount of data and gigabytes real-time batch data to process with its streaming analytics.

Kafka connects to reflections avoid code duplications, annotations, databind in scala clusters in tools serialization of errors in AWS event consuming client source records sinking and transformation for Lambda pytest use cases for data processing pipeline building and deploying large scale data processing pipelines using distributed storage platforms like HDFS, S3, NoSQL databases in a production CI/CD environment.

Distributed the processing platforms like Hadoop, Spark or PySpark.

Hive tables on end-to-end big data solutions covering data ingestion, data cleansing, ETL, data mart creation, and exposing data for consumers. Handle complex data sets from different sources and converged them onto a single compute platform on both static real-time data ingestion methodologies.

Query authoring (Advanced SQL) , working familiarity with NoSQL Databases in exchanging data via microservices, API gateway, and across languages R or Python and scripting in Unix Commands, Unix shells, and serversScala feature ETL/ELT extraction, data modeling, and optimal integrations with internal or external business key platform in missing data,dataprocessing templates, and categorial data R programming.

Languages

English

Advanced

Hindi

Advanced

Marathi

Advanced

Education

Oct 2009 - Jun 2012

AIET College, Rajasthan Technical University

BCA, Specialisation in Computer Science and Mathematics · India

Certifications & licenses

AWS 2.0 Cloud

GCP

Python

Need a freelancer? Find your match in seconds.

Try FRATCH GPT

Similar Freelancers

Discover other experts with similar qualifications and experience

Biju K.

Experience

Senior AWS Data Engineer

Keeno Technologies

Analytical Data Engineer

Brillio

Senior Data Engineer

Enum Informatics Private Ltd

Senior Data Engineer

KPIT

Senior Data Engineer

Kaiser Permanente

Senior Data Engineer

The Home Depot

Data Engineer

Caterpillar

Summary

Languages

Education

AIET College, Rajasthan Technical University

BCA, Specialisation in Computer Science and Mathematics · India

Certifications & licenses

AWS 2.0 Cloud

GCP

Python

Similar Freelancers

Freelance AI Strategist & Governance Expert

Senior Data/ML Consultant & Technical Lead

Solution Architect

Data Scientist & AI Engineer & AI Architect

Full-Stack AI and Product Consultant

Data Engineer

Freelancer, Solution Architect

Data Architect

Tech Lead Customer Base Documentation Automation

Senior Data Engineer

AR/VR/XR Architect

Senior Data Scientist | AI & STEM Research Engineer | Educator

Senior System Engineer

Data Science Expert and AI Strategist

IT Freelancer - AWS, Data, DevOps, AI