Lead the end-to-end data lifecycle for protein-expression projects by continuously ingesting and curating a growing dataset (currently ~4,500 records).
Implement data collection of 3D protein structures using AlphaFold to enable downstream structure-based analysis by molecular-dynamics specialists.
Built reproducible training and inference pipelines in Python, managing code versioning and continuous integration through GitHub in collaboration with machine learning experts.
Work closely with laboratory scientists (molecular biologists and chemists) to jointly design ad-hoc experiments using principles of experimental design aimed at validating and refining model predictions.
Facilitated cross-functional collaboration by presenting project updates to a team of laboratory scientists, simplifying complex technical concepts, and incorporating feedback to optimise model predictions, resulting in a 20% efficiency gain.
Secured senior leadership approval to invest in a critical ML project by presenting results and demonstrating its strategic value.
Develop a user-friendly Streamlit web application allowing laboratory scientists to run predictions and inspect results independently, thereby enabling the direct integration of ML outputs into lab workflows, improving model design and usability.
Sep 2021 - Aug 2022
1 year
Siena, Italy
Intern - Vaccines R&D
GSK
Initiated and executed curation of the protein-expression dataset, aggregating and standardising records from multiple departments and countries (including teams in Belgium and the United States).
Collaborated autonomously with experts from diverse backgrounds — molecular biologists, chemists and data teams — to understand their research needs, align on project goals, and translate these requirements into a harmonised and ML-ready dataset for computational biology applications.
Conducted exploratory model prototyping using Microsoft Azure AutoML to benchmark algorithms and accelerate model selection.
Languages
Italian
Native
English
Intermediate
Education
Sep 2021 - Oct 2022
University of Padova
Postgraduate course · Machine Learning & Big Data for Precision Medicine