Guo Yanghong, Yu Lei, Guo Lei, Xu Lin, Li Qiwei
Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, United States.
Quantitative Biomedical Research Center, Peter O'Donnell Jr School of Public Health, The University of Texas Southwestern Medical Center, Dallas, TX 75390, United States.
Biometrics. 2025 Jan 7;81(1). doi: 10.1093/biomtc/ujaf005.
The abundance of various cell types can vary significantly among patients with varying phenotypes and even those with the same phenotype. Recent scientific advancements provide mounting evidence that other clinical variables, such as age, gender, and lifestyle habits, can also influence the abundance of certain cell types. However, current methods for integrating single-cell-level omics data with clinical variables are inadequate. In this study, we propose a regularized Bayesian Dirichlet-multinomial regression framework to investigate the relationship between single-cell RNA sequencing data and patient-level clinical data. Additionally, the model employs a novel hierarchical tree structure to identify such relationships at different cell-type levels. Our model successfully uncovers significant associations between specific cell types and clinical variables across three distinct diseases: pulmonary fibrosis, COVID-19, and non-small cell lung cancer. This integrative analysis provides biological insights and could potentially inform clinical interventions for various diseases.
不同表型的患者之间,甚至相同表型的患者之间,各种细胞类型的丰度都可能有显著差异。最近的科学进展提供了越来越多的证据表明,其他临床变量,如年龄、性别和生活习惯,也会影响某些细胞类型的丰度。然而,目前将单细胞水平的组学数据与临床变量整合的方法并不完善。在本研究中,我们提出了一个正则化贝叶斯狄利克雷多项回归框架,以研究单细胞RNA测序数据与患者水平临床数据之间的关系。此外,该模型采用了一种新颖的层次树结构,以在不同细胞类型水平上识别这种关系。我们的模型成功地揭示了三种不同疾病(肺纤维化、COVID-19和非小细胞肺癌)中特定细胞类型与临床变量之间的显著关联。这种综合分析提供了生物学见解,并可能为各种疾病的临床干预提供依据。