Department of Medicine A, Albert-Schweitzer Campus 1, University Hospital Münster, 48149, Münster, Germany.
Cluster of Excellence EXC 1003, Cells in Motion, University of Münster, 48149, Münster, Germany.
Nat Commun. 2019 Nov 28;10(1):5417. doi: 10.1038/s41467-019-12713-5.
Gene expression is controlled by many simultaneous interactions, frequently measured collectively in biology and medicine by high-throughput technologies. It is a highly challenging task to infer from these data the generating effects and cooperating genes. Here, we present an unsupervised hypothesis-generating learning concept termed signal dissection by correlation maximization (SDCM) that dissects large high-dimensional datasets into signatures. Each signature captures a particular signal pattern that was consistently observed for multiple genes and samples, likely caused by the same underlying interaction. A key difference to other methods is our flexible nonlinear signal superposition model, combined with a precise regression technique. Analyzing gene expression of diffuse large B-cell lymphoma, our method discovers previously unidentified signatures that reveal significant differences in patient survival. These signatures are more predictive than those from various methods used for comparison and robustly validate across technological platforms. This implies highly specific extraction of clinically relevant gene interactions.
基因表达受许多同时发生的相互作用控制,在生物学和医学中经常通过高通量技术进行集体测量。从这些数据中推断出产生影响和合作的基因是一项极具挑战性的任务。在这里,我们提出了一种无监督的假设生成学习概念,称为相关最大化的信号分解 (SDCM),它可以将大型高维数据集分解为特征。每个特征都捕获了多个基因和样本中一致观察到的特定信号模式,可能是由相同的潜在相互作用引起的。与其他方法的一个关键区别是我们灵活的非线性信号叠加模型,结合了精确的回归技术。通过分析弥漫性大 B 细胞淋巴瘤的基因表达,我们的方法发现了以前未被识别的特征,这些特征揭示了患者生存的显著差异。这些特征比用于比较的各种方法的特征更具预测性,并且在不同的技术平台上都具有稳健的验证。这意味着高度特异性地提取与临床相关的基因相互作用。