Our computational and disease genomics lab is focused on developing and applying robust bioinformatics methods to large-scale, high-throughput, multi-dimensional Genomics, Epigenomics, and Transcriptomics data with the main emphasis on elucidating and understanding human disease progression and development, which is further expected to translate to public health practice.
Genome-wide association studies have identified thousands of noncoding variants that are statistically associated with human traits and diseases. However, the functional interpretation of these variants remains a significant challenge. We described the first atlas of human 3’-UTR alternative polyadenylation (APA) Quantitative Trait Loci (3’aQTLs) (Nature Genetics 2021, Nucleic Acids Research (2022), Quantitative Biology 2022). APA occurs in approximately 70% of human genes and substantively impacts cellular proliferation, differentiation, and tumorigenesis. Mechanistically, 3’aQTLs could alter polyA motifs and RNA-binding protein binding sites, leading to thousands of APA changes. Importantly, 3’aQTLs can be used to interpret ~16.1% of trait-associated variants and are largely distinct from other QTLs such as eQTLs. The genetic basis of APA (3’aQTLs) thus represents a novel molecular phenotype to explain a large fraction of noncoding variants and to provide new insights into complex traits and disease etiologies.
RNA structure can be dynamically regulated and have profound consequences on disease context such as cancer. To study post-transcriptional regulation in human cancers, we have been using quantitative techniques to address biological problems relevant to cancer. For example, we constructed TC3A, the largest APA profiling in human cancers (Nucleic Acids Research 2017). We also designed a new computational approach from existing cancer “big data” and revealed the first evidence of a cancer-specific ubiquitin ligase driving APA, leading to 3’UTR shortening in many tumor types (Molecular Cell 2020). Moreover, we have extensive experience in general genomics and transcriptomics data analysis. we report the draft genome sequence of the important human pathogen Salmonella Typhimurium multidrug-resistant strain ST1660/06 (Journal of bacteriology 2012) and also investigated the genome dynamics of Vibrio parahaemolyticus at the species level (BMC Genomics 2014).
To discover new RNAs and novel functions and mechanisms using deep sequencing methodologies, we have developed a series of novel computational methods for analyzing high-throughput sequencing projects to tackle key biological questions. For example, we developed sPepfinder, a novel computational method for predicting small proteins in bacterial genomes, and revealed the first atlas of small proteins among proteobacteria (BioRxiv 2020). sPepfinder has been validated experimentally and also leads to the direct discovery of a novel virulence-associated small protein (MicroLife 2020). Besides, we have designed and applied our computational methods through multiple RNA deep sequencing collaborative projects. These projects covered a broad spectrum of the fate of cellular RNAs, including RNA transcription (Nucleic Acids Research 2017), RNA processing (EMBO Journal 2017, Molecular Cell 2018) to RNA decay (Molecular Cell 2017), and single cell RNA-seq (Nature Microbiology 2017). Many of these papers have been highly cited and recommended in the field.