Zhong Jiayuan, Li Junxian, Gu Xuerong, Ding Dandan, Ling Fei, Chen Pei, Liu Rui
School of Mathematics, Foshan University, Foshan 528000, China.
School of Mathematics, South China University of Technology, Guangzhou 510640, China.
Natl Sci Rev. 2025 May 14;12(8):nwaf189. doi: 10.1093/nsr/nwaf189. eCollection 2025 Aug.
Complex disease progression typically involves sudden and non-linear transitions accompanied by devastating effects. Uncovering such critical states or pre-disease stages and discovering dynamic network biomarkers (signaling molecules) is vital for both comprehending disease progression and preventing or delaying disease deterioration. However, the detection of critical points using high-dimensional limited sample data or single-cell data proves notably challenging, as traditional statistical approaches often fail to deliver accurate results. In this study, based on optimal transport theory and Gaussian graphical models, we present an innovative computational framework, the sample-perturbed Gaussian graphical model (sPGGM), designed to analyze disease progression and identify pre-disease stages at the specific sample/cell level. Specifically, by employing population-level optimal transport and Gaussian graphical models, the proposed sPGGM effectively characterizes dynamic differences between the baseline distribution and the perturbed distribution relative to the specific case sample, thus enabling the identification of pre-disease stages and the discovery of signaling molecules during disease progression. The reliability and effectiveness of our method is demonstrated by conducting a simulated dataset and evaluating various data types, including four single-cell datasets, influenza infection data, and six distinct bulk tumour datasets. In comparison with existing single-sample methods, our proposed method exhibits improved capability in pinpointing critical point or pre-disease stages. Moreover, the effectiveness of computational results is highlighted through the analysis of the functional roles of signaling molecules.
复杂疾病进展通常涉及突然且非线性的转变,并伴有毁灭性影响。揭示此类关键状态或疾病前期阶段以及发现动态网络生物标志物(信号分子)对于理解疾病进展以及预防或延缓疾病恶化至关重要。然而,使用高维有限样本数据或单细胞数据检测关键点极具挑战性,因为传统统计方法往往无法给出准确结果。在本研究中,基于最优传输理论和高斯图形模型,我们提出了一种创新的计算框架——样本扰动高斯图形模型(sPGGM),旨在在特定样本/细胞水平分析疾病进展并识别疾病前期阶段。具体而言,通过采用群体水平的最优传输和高斯图形模型,所提出的sPGGM有效地刻画了相对于特定病例样本的基线分布与扰动分布之间的动态差异,从而能够识别疾病前期阶段并发现疾病进展过程中的信号分子。通过进行模拟数据集以及评估各种数据类型(包括四个单细胞数据集、流感感染数据和六个不同的批量肿瘤数据集),证明了我们方法的可靠性和有效性。与现有的单样本方法相比,我们提出的方法在确定关键点或疾病前期阶段方面表现出更强的能力。此外,通过对信号分子功能作用的分析突出了计算结果的有效性。