Spearheaded collaborative research projects in cardiovascular immunology, utilizing scRNA-seq analysis to elucidate the molecular mechanisms of complement component 5a signaling in innate immune macrophages
Drove the discovery and validation of novel genes and pathways in cardiomyopathies by integrating complex clinical and real-world datasets (e.g., SNP genotyping, electronic medical records)
Engineered a novel 1DCNN-Transformer hybrid architecture for miRNA-mRNA binding prediction directly from DNA sequences achieving SOTA performance with a superior AUC of 0.903 that significantly reduced false-positive rates compared to benchmark models, released on [link]
Mar 2021 - Jun 2021
4 months
AI Scientist
Kaggle Competition – Molecular Translation
Led a 4-person team to achieve 2nd place among 874 competitors (Kaggle Grandmaster Top 0.1%) in the Molecular Translation Challenge, successfully solving the critical data transformation challenge of low OCR accuracy on a large, corrupted FDA-sourced chemical molecular scanned image dataset
Engineered a 3-phase model featuring ResNet-Transformer image captioning, candidate generation, and multi-model re-ranking
Boosted model robustness by 40% through molecular-specific data augmentation, achieving an Edit Distance of 0.54 (vs. baseline 0.9+), and enabling accurate conversion of 1.6 million legacy chemical scanned images into machine-readable InChI format
Mar 2019 - Jun 2022
3 years 4 months
Shanghai, China
Research Scientist
Shanghai Institute of Materia Medica
Resolved API integration codes for models like GPT-3.5-turbo in chemical text-mining tasks, and designed batch data transfer strategies to support large-scale chemical literature processing
Conducted structure-based virtual screening to identify potential inhibitors targeting the interaction between SARS-CoV-2 spike protein receptor-binding domain and host cell angiotensin-converting enzyme 2, which is a key step in viral entry into host cells
Engineered and validated a Random Forest model for Immune Checkpoint Inhibitor (ICI) response prediction using multi-omics data from 281 cancer patients, achieving a state-of-the-art AUC of 0.85 on external validation, significantly surpassing the baseline model (AUC, 0.763), and identifying novel biomarkers enriched in crucial immune pathways
Nov 2017 - Jun 2018
8 months
Qingdao, China
Intern
BGI Genomics
Engineered a bit-parallel fuzzy regex pipeline to perform SNP-tolerant cross-species comparative genomics on five Takifugu species, enabling automated annotation of conserved genomic sites
Achieved a 10-time speedup in querying and accessing 30 GB of genome annotation data using a MySQL database (python library), significantly improving the efficiency of comparative analysis
Based on above results, automated the entire RNA-Seq analysis pipeline in Linux, from read mapping and data cleaning to gene annotation, standardizing the workflow for high-throughput data processing
Jun 2016 - Sep 2016
4 months
Wuhan, China
Data Scientist
National Mathematical Modeling Competition
Achieved the National second Prize (among Graduates) in the competition, served as lead model developer and scientific writer
Developed and applied a Logistic Regression model combined with GWAS statistical testing on 9445 SNPs to accurately identify the three most likely disease-associated loci for the target-inherited disease within 300 provided genes
Languages
Chinese
Native
English
Advanced
German
Elementary
Education
Sep 2022 - Present
Ludwig Maximilian University of Munich
PhD in Medical Research · Medical Research · Munich, Germany
Sep 2019 - Jun 2022
University of Chinese Academy of Sciences
Master of Science in AI-assisted Drug Design · AI-assisted Drug Design · China
Sep 2014 - Jun 2018
Huazhong Agricultural University
Bachelor of Science in Bioinformatics · Bioinformatics · China