Shenzhen Bay Laboratory
A central question in human genetics is how the vast majority of disease-associated variants, over 90% of which fall in noncoding regions, actually cause disease. Our lab tackles this question by developing computational methods and building large-scale genomic resources to capture the full spectrum of molecular regulation. We work across three scales: new regulatory dimensions (from 3′ to 5′ ends of mRNA, from coding to noncoding RNA), new population resources (the first long-read multi-omics atlas for East Asian populations), and new cellular resolution (single-cell multi-dimensional QTL mapping). Our open-source tools are used by research groups worldwide. Since establishing the lab in 2021, we have published in Nature, Nature Genetics, Nature Biomedical Engineering, Nature Communications, Science Advances, and other journals, with multiple ongoing studies pushing these directions toward clinical translation.
Current functional genomics resources, most notably GTEx, are overwhelmingly based on European-ancestry cohorts and short-read sequencing. This leaves two critical blind spots: East Asian populations lack dedicated regulatory references, and structural variants (SVs) and tandem repeats (TRs) are systematically missed. We lead the GTOP project to close both gaps simultaneously, integrating PacBio HiFi whole-genome sequencing, full-length transcriptomics, and proteomics across 33 tissues from 160 East Asian donors. GTOP has revealed that SVs/TRs are 4.5-fold enriched among high-confidence causal variants — demonstrating that a large fraction of GWAS signals previously attributed to SNVs are in fact driven by structural variation and that over half of East Asian disease loci cannot be explained by existing European resources, establishing an indispensable foundation for East Asian precision medicine.
Genome-wide association studies have identified thousands of noncoding disease variants, yet functional interpretation remains a major challenge because existing approaches focus almost exclusively on gene expression. Our work on alternative polyadenylation (APA) is the first of its kind to systematically link 3′UTR regulation to disease genetics: we constructed the first human 3′aQTL atlas (Nature Genetics, 2021), showing that APA alone explains ~16.1% of previously uninterpretable disease heritability. We then constructed the first immune-response APA regulatory map (Nature Communications, 2023) and has since expanded into a complete regulatory framework — from 3′aQTL to the discovery of 5′aQTL, a new class of transcription-initiation QTL (Science Advances, 2024; Nature Communications, 2026), to the first genetic effect map of enhancer RNAs (Advanced Science, 2025) — collectively revealing that the majority of post-transcriptional causal genes are invisible to conventional eQTL analysis. We have further demonstrated clinical translation potential by showing that 3′UTR shortening drives tumor immune evasion and developing a CRISPR-based 3′UTR-targeted therapeutic strategy (Nature Biomedical Engineering, 2026).
Tissue-level analyses average out cell-type-specific effects, which are often most relevant to disease. Our sc-xQTL framework addresses this by simultaneously mapping gene expression, APA, and RNA editing QTLs at single-cell resolution across 31 immune cell types from over 6 million cells. The key finding is striking, as post-transcriptional QTLs explain substantially more of the heritability of immune disease than traditional expression QTLs, and 70% of the causal genes identified through APA and RNA editing are entirely independent of gene expression, fundamentally redefining the regulatory landscape of immune disease. Supporting this direction, we have built scQTLbase, the largest integrated single-cell eQTL database covering 57 cell types and 95 cell states (Nucleic Acids Research, 2024) and developed MAAS for multi-modal single-cell integration (Genome Medicine, 2026).