Establishing causality is arguably the ultimate goal in any field of science such as biology, psychology, neuroscience, climate science, robotics, and quantum mechanics. The discovered causal relationships are useful for predicting a system's response to external interventions, a key step towards understanding and engineering that system. While the gold standard for causal discovery remains the controlled experimentation, it can be too expensive, unethical, or even impossible in many cases, particularly on human beings. Therefore, inferring the unknown causal structures of complex systems from purely observational data is often desirable and, sometimes, the only option. My students and I are actively working on developing new structural causal models for automated causal discovery. The major challenges include non-identifiability due to Markov/observational/distributional equivalence, hidden confounders, feedback loops, heterogeneity, and complex data types (contintuous, count, categorical, zero-inflated, compositional, functional, etc). This line of research has been supported by NSF DMS-2112943.
Single cell technologies such as single-cell RNA-seq (scRNA-seq) are rapidly revolutionizing a wide range of biomedical research. Unlike traditional bulk RNA-seq technologies that measure global gene expression averaged over a heterogeneous cell population, scRNA-seq is able to examine the process of DNA transcription in individual cells and is thus useful for unveiling transcriptomic heterogeneity at the single-cell level. We are facing substantial statistical and computational challenges in analyzing scRNA-seq data because they are massive, sparse, heterogeneous, and noisy. We (in collaboration with Dr. Robert Chapkin and Dr. James Cai at Texas A&M) are developing novel statistical methods that address the challenges in (i) finding new cell types, (ii) characterizing cell differentiation dynamics, (iii) discovering gene regulations at the single-cell level, (iv) monitoring structural, functional, or phenotypic changes under different experimental conditions, and (v) relating rare transitional cells or cell phenotypes to disease progression. We are also working on integrating single-cell multi-omics data. The unique challenge of data integration in single-cell multi-omics is that each observation/cell can be assayed only by one modality. Therefore, neither horizontal nor vertical data integration applies here. This line of research has been supported by 1R01GM148974-01.
Microbes are everywhere – in the oceans, under the rocks, and inside us! For instance, our gastrointestinal tract harbors a diverse and abundant community of microbes which are essential for the well-being of the host via modulating host metabolism, immunity, and nutrient absorption. Microbiota has profound effects on the formation, development, and progression of many pathologies such as psoriasis, obesity, preterm birth, prediabetes, cancers, and neurological disorders. The composition of microbiota shows great heterogeneity both within and across host populations (e.g. healthy vs pathological) which can be partially explained by the host-microbiome interactions. We (in collaboration with Dr. Robert Chapkin, Dr. Bani Mallick, Dr. Irina Gaynanova, and Dr. Jessica Galloway-Peña at Texas A&M) are interested in developing novel statistical methods to investigate microbial and host heterogeneity, and to model dynamic host-microbiome interaction networks in response to various factors including gene knockout, diet, and exposure to carcinogen. This line of research has been supported by Texas A&M Triads for Transformation, TAMIDS Postdoctoral Project Program, and Texas A&M College of Science Strategic Transformative Research Program.
Electronic Health Records
Electronic health records (EHR) data electronically document medical diagnoses and clinical symptoms by the health care providers. The digital nature of EHR automates access to health information and allows physicians and researchers to take advantage of a wealth of data. EHR has motivated us to develop various data-driven approaches for a wide range of tasks including automated phenotyping, drug assessment, clinical decision support, and data mining. We (in collaboration with Dr. Yanxun Xu and Dr. Leah Rubin, Johns Hopkins University) are currently developing new Bayesian parametric and nonparametric models for HIV EHR data to estimate the depressive effects of antiretroviral therapy based on patients' longitudinal medication data and depression outcomes, adjusting for socio-demographics, behavioral variables, genetic markers, clinical factors, and treatment history. We are also working on Bayesian reinforcement learning methods to adaptively select treatments for optimizing HIV patients' long-term health outcomes. Additionally, we (in collaboration with Dr. Sherecce Fields at Texas A&M and Dr. Chadi Calarge from Baylor College of Medicine) are investigating mental health problems in adolescents using NHANES data. This line of research has been supported by 1R01MH128085-01, NSF DMS-1918851, and 1R03MH127298-01.