Davide Imperati

Consultant – Research Lead – NLP / GPT and Ontology Engineer

Msida, Malta

Experience

Sep 2023 - Present
1 year 9 months
Remote

AI Lead

Blackrock

AI driven Whole Portfolio Optimization and buy side reseaech summarization.

Jan 2022 - Jul 2023
7 months
United Kingdom
Remote

Consultant – Research Lead – NLP / GPT and Ontology Engineer

Open University

Explore the potential of the very novel GPT language models in conjunction with Graph Databases, fine tuning of GPT prompts and PoC on automated AI driven data linkage

Details of the project are still confidential

First product relating to AI Assisted Career Guidance released on April 26th 2023

Technologies: Research, Jupyter Notebook, Python, Panda, Numpy, Scikit, FastApi, Flask, Django, Java, Jena, GIT, GitHub, CICD, CI/CD, Jira, TDD, DevOps, Terraform, Docker, Cloud Azure, Azure App, Azure OpenAI, DBT, API, sFTP, yarrml, RMLMapper, GraphDB, Ontorefine, SQL, SPARQL, Graph Database, OWL, RDF, Ontologies, GPT-3, GPT-3.5-turbo, GPT-4.0, (Including programmatic interaction with the APIs exposed by OpenAI.io and Azure)

Jul 2022 - Dec 2022
6 months
London, United Kingdom

Consultant – Tech Lead - Data Engineer and Semantic Language Engineer

Astrazenaca

Redesign, Rebuild, and Migrate the Semantic Engine supporting the Metadata for a Number of Datasources from the current third party tool to an in-house replacement

The project required to replace the current implementation of the semantic data hub. The solution required design a product able to handle a volume of metadata collected across across multiple divisions, develop a proof of concept (PoC), confirm the PoC with the stakeholders, and deliver a fully fledged implementation suitable for productionization. The solution consisted of a set of extractors, based on meltano, custom API connectors and ingestors written in Python to harvest the metadata from different sources. The metadata was then staged in Postgres and cleansed using DBT transformations. The Clean metadata got mapped to the internal ontologies using rmlmapper, transformed into triples and nquads, and loaded into Allegrograph.

Once in Allergrograph we used SPARQL queries to augment data across different graphs and extract Knowledge out of the bulk of information. The solution is designed to be deployed to AWS using a combination of native services (Airflow, S3, RDS-Postgres, EKS) and containers (Allegrograph, custom transformers, extractors, loaders, Meltano, DBT, RMLMapper). Final workload yielded datasets in excess of 50Mln triples.

Technologies: Stakeholder engagement, yarrml, RMLMapper, Allegrogarph, SQL, SPARQL, Graph Database, OWL, RDF, Ontologies, Protege, Meltano, DBT, Postgres, Snowflake,SnowPipe, Matillion, Python, FastApi, Flask, Django, GIT, GitHub, GitActions, CICD, CI/CD, Jira, TDD, DevOps, AWS, Terraform, Docker, Docker Compose, Airflow, RDS-Postgres, Cloud, AWS, API, sFTP.

Nov 2021 - Aug 2022
10 months
London, United Kingdom

Consultant – Tech Lead - Data Engineer and Machine Learning Engineer

Many Pets (Bought By Many)

Onboard internal and external dataset to support Customer Service and Marketing

Projects Supported:

  • Automate Import Purecloud data, reformat according to the specification to enable advanced call center monitoring and build advanced analytic and monitoring of the activities. The project provided 15% performance improvement of the internal call center and 42% improvement of the performance of the 3rd party call center.
  • Automated Import Mention-Me data and make it available to the marketing department for analytics. This initial enabler allowed their marketing department to start analysing subscriptions and referrals using automated tools instead of manual processing with great saving of time.
  • Set up their Airflow instance to run database manipulations with DBT and analytics tools in a containerized environment to improve performance and decouple DBT dependencies from Airflow dependencies.

Technologies: Stakeholder engagement, Python, Pandas, Scipy, FastApi, Flask, Django, GIT, GitHub, Jenkins, Jira, ClickMe, CI/CD, TDD, DevOps, Terraform, Docker, Fivetran, BigQuery, Snowflake, Composer-Airflow, Cloud GCP, DBT, API, sFTP, Vertex AI.

Mar 2021 - Nov 2022
1 year 9 months
London, United Kingdom

Consultant – Tech Lead - Data Engineer and Machine Learning Engineer

Tesco Plc

Translate Data Science Models (R, Jupyter Notebook, Matlab) in production ready application in the Azure Cloud and on-prem Hadoop/Spark cluster

Project supported:

  • Commodities trading project, It produced multi-million savings in the procurement of wheat and corn. Note we were trading on the bullish market leading to the Ukrainian crisis, thus the market situation might account for part of the performance.
  • Product match: Automatically populating best match between internal and competitor product reducing manual intervention of a factor 4x, the average match time per item going down from about 3 minutes to less than a minute. The proposed item was accepted in 96% of the cases.
  • Fresh: Modeling price reduction of products close to expiration date, preliminary results hint at a reduction in waste in the order of 20%.

Technologies: Stakeholder engagement, Java (EE), Python, Pandas, NLTK, Scipy, Numpy, Hadoop, HIVE, Pyspark, FastApi, Flask, Django, GIT, GitHub, Jenkins, Jira, CI/CD, TDD, DevOps, Automated Testing, Load Testing, ETL, Pipelines, Data Preprocessing, Data Lake, Azure, AzureML, Kafka, Spark, Hadoop, Hive, SQL, PostgreSQL, Teradata, Refinitiv Point Connect, Bloomberg SAPI.

Apr 2020 - Oct 2020
7 months
London, United Kingdom
Remote

Consultant – Core Data Engineering Lead – Neuron Program

Vodafone

Delivered the core of the migration of Vodafone's Big Data platform to Google Cloud (Team of 15 – Fully Remote, UK, India)

The platform serves all European markets and handles several terabyte of data per day (data retention of about 2-3 Petabyte of rolling data)

Refurbished capabilities of the Core Data Engineering Squad for the migration of big data platform to Google Cloud after the impact of IR35 reform. Delivered the migration under tight time/budget constraints with minor delay despite the serious constraints posed by Covid-19.

Initial challenges: Team was impacted by the IR35 related change of policies, the project suffered loss of knowledge, delay, high technical debt, and missing documentation.

Benefits: Team was reinforced, Technical debt was assessed and its impact mitigated, a reduction in scope was agreed with the stakeholders to fit timelines and budget. The project was delivered with minor delay despite serious technical, budgetary and environmental constraint. "Vodafone calls for transformative insights, Google Cloud answers" ([link]

Technologies: Stakeholder engagement, Java (EE), Scala, Python, Pyspark, Github, Jenkins, Jira, CI/CD, TDD/BDD, DevOps, Test Automation, Load/Stress Test, Cost optimization, Google Cloud platform (GCP), multiple services including DataFlow (Apache Beam), Composer (Airflow), DataProc, Cloud Storage, BigQuery, BigTable, Spanner, Pub/Sub, internal microservice architecture based on Kubernetes, Docker, Terraform.

Jul 2019 - Feb 2020
8 months
London, United Kingdom

Consultant – Quant Research / Machine Learning - Lead Developer

Lloyds Banking Group

Revamped the automated trade surveillance platform to meet the criteria set by the auditor (Team of 6 - co-located).

  • Mediated between stakeholders to have them agree on a standardized approach across different asset classes.
  • Mediated between stakeholders and developers to ensure delivery was meeting requirement.
  • Defined templates for efficient and standardized implementation all analytic.
  • Implemented a set of critical high-end analytic using NLP, ML, and advanced quant methods.

Initial challenges: Pending review from the regulator. The project suffered disconnect between stakeholders, compliance requirements, and developers. The platform was legacy. The development team suffered high attrition rate, thus loss of knowledge. Documentation was partial.

Benefits: Passing auditing (serious cost reduction). Providing meaningful alerts (67% spam reduction on downstream teams). Platform was consolidate and made extensible.

Asset Classes: FX spot/options, rates futures/bonds/swaps, repo, bespoke OTC.

Technologies: Stakeholder engagement, Java (EE), Python, Pandas, NLTK, Scipy, Numpy, Pyspark, Dask, Bitbucket, Jenkins, Jira, CI/CD, TDD, DevOps, Risk Scenarions, Automated Testing, Load Testing.

Apr 2019 - Jun 2019
3 months
London, United Kingdom

Interim Director of Product

EMY Design

Managed the start-up of the company from ground zero to the first viable product, with particular focus on the e-commerce exposure and the click through rate optimization.

Jan 2019 - Apr 2019
4 months
London, United Kingdom

Consultant – Principal Data Scientist

News Uk – The Times

Delivered "Project James". A reinforcement learning AI for direct marketing optimization.

News UK won a Google sponsored innovation grant aimed at delivering an advanced solution to real marketing problems. Attrition of the initial investigator created condition for reassigning the task. The intervention required assessment of the partially implemented project, baseline the approach, rebuild the reinforcement learning core using state of the art tools. Tune and deliver a production viable tool within the scheduled time frame.

Challenges: Time pressure for delivery. Partially implemented platform with partial documentation. Full research project with no previous case study to leverage for comparison.

Benefits: "JAMES has revolutionised churn further, and advisors informed by readers interests underpin an award winning contact centre" ([link]

Technologies: Python, pandas, scipy, numpy, TensorFlow, Django, Flask, github, jenkins, jira, GitOps, CI/CD, DevOps, Kubernetes, Docker, Terraform, Microservice Architecture

Jul 2018 - Dec 2019
6 months
London, United Kingdom

Consultant – Principal Data Scientist

News Uk – The Times

Delivered the propensity model and API (Team of 5 - co-located).

The client wanted to improve conversion rate on the digital platform, and deliver a personalized user experience. Therefore, we piloted an online propensity model. The model follows each user of The Times Digital in real time and predicts the best opportunity for calls to action. E.g. subscriptions, cross-sale, up-sale.

Challenges: The model should work at high throughput (1000+prediction/sec) and low latency (<250ms max response time).

Benefits: It increased subscriptions ad cross-sales 5% and 9%, respectively. Piloted the deployment of high throughput APIs in NewsUK's brand new k8s cluster.

"Best Ever Growth for The Times & The Sunday Times Thanks to Usable Data Science" ([link]

Technologies: Stakeholder management, python, pandas, nltk, scipy, numpy, API, django, nginx, docker, Kubernetes (k8s), Terraform, Microservice Architecture, TensorFlow, github, jenkins, jira, CI/CD, DevOps, New Relic.

Mar 2017 - Aug 2018
1 year 6 months

Vicepresident

JP Morgan Chase

Managed the delivery of the Cloud Logging and Monitoring Platform (Team of 20 across 3 sites).

In the framework of public cloud adoption, JPMC needed a standardized, large scale, logging and monitoring system to meet cyber-security requirements for all application in the public cloud.

Davide joined the team after the PoC of the platform. He reviewed architecture and implementation. Then, scaled the platform to handle 5TB of data a day (approximately 5 billion messages with peak of 1.3 bln during the first hour of trading).

Challenges: very new project, under hard constraints in terms of data protections, thus limited availability of approved cloud services. Very challenging requirements in terms of SLO/SLA, high availability, disaster recovery, and sustained recovery.

Benefits: The platform allowed to monitor an initial set of 5 mission critical applications in the public cloud (AWS). It pioneered new technologies, produced a number of architectural patterns new to JPMC, and demonstrated its ability to scale up to a higher number of monitored applications at a button push.

Technologies: Leadership, AWS (API Gateway, Route53, S3, DynamoDB, Kinesis, Elastic Beanstalk, Lambda, ELB, AIM, CloudWatch, CloudTrail, etc.), Boto, Terraform, FluentD, Kafka, Kafka Streams, (replaced by Kinesis after SOC3), Kinesis Firehose, NiFi, Elastic Search, Logstash, Kibana, Java (EE), Python, Bitbucket, Jenkins, Jira, CI/CD, TDD, BDD, DevOps, Hera (JPMC Terraform based API), Automate Testing, Load Testing. Microservice architecture, Docker, Kubernetes (k8s), Datadog. L1 and L3 support during rollout and production, respectively.

Mar 2016 - Feb 2017
1 year

Vicepresident

JP Morgan Chase

Set the basis for standardized regulatory reporting across all business (Regulatory driven - Team of 4).

Due to regulatory change, the company was required to produce reporting aggregating across all lines of business (LoB). It required the standardization of thousands of words used for reporting ('Loan' has a different meaning in retail than in derivatives). We created controlled vocabularies, devised and automated the procedures for meta-data management. Served the dictionaries and the reference data through restAPI based a constellation of microservices. Promoted numerous educational interventions across the organization.

Challenges: High exposure to the regulators. Humongous amount of non listed words that needed attentions. Serious need to mediate between different high ranking stakeholders (senior executive and managing directors)

Benefits: We hampered the regulatory risk and provided tools to gain insight in the corporate dynamics.

Asset Classes: FX spot/options, rates futures/bonds/swaps, derivatives, OTC.

Technologies: Java (EE), Spring, Python, RDF, OWL, SparQL, Semantic web standards, Ontologies, Semantic Wiki, Knowledge graphs, Graph Database, Neo4j, BigQuery(Blazegraph), ISO20022, bitbucket, jenkins, jira, CI/CD, TDD, BDD,DevOps. Docker, Microservices.

Nov 2014 - Feb 2016
1 year 4 months

Vicepresident

JP Morgan Chase

Developed the Meta Analytic of Corporate and Investment Branch (CIB) of the bank.

As part of the digital transformation initiative, JPMC aimed at labelling and scoring all data repositories and all software products owned by the line of business. We defined the Data Quality Metrics, formal ontologies for data representation of logical data models (LDM), scan through the meta-data of all databases inferring the physical data model (PDM) and linked them through heuristics. The results were manually refined by Information Architects.

Challenges: Very broad collections of dishomogeneous data. Data quality was not always prime. Some data steward was only partially cooperative with the process.

Benefits: The semi-automated approach increased productivity of the Information Architects by a factor 4.7x.

Technologies: Java, Spring, Python, RDF, OWL, Semantic web standards, Ontologies,Knowledge graphs, Graph Database, BigQuery, ISO11179, bitbucket, jenkins, jira, CI/CD, TDD, DevOps.

Summary

Davide Imperati's background builds on two decades of academia and corporate experience in quant research, data strategy, and large scale cloud migration. His technical experience is compounded with robust soft skills and deep understanding of business domain in finance, telecom, media, logistics, and digital marketing. He operates during the initial phases of green field data-driven projects (PoC – Pilot). He also has a proven experience intervening in under-performing data related projects and deliver them controlling for budget, time, and resource constraint.

Languages

Italian
Native
English
Advanced
German
Advanced

Education

New York University

PostDoc · United States

Max Plank Institute

PostDoc · Germany

Lorem ipsum dolor sit amet

PhD · Computational Statistics

Certifications & licenses

Certified Aws Cloud Practitioner

Certified PADI Instructor

Certified Scrum Product Owner