Department of Computer Science and Engineering, South China University of Technology, Guangdong, China.
BMC Med Inform Decis Mak. 2022 Jul 23;22(1):190. doi: 10.1186/s12911-022-01938-y.
Patient subgroups are important for easily understanding a disease and for providing precise yet personalized treatment through multiple omics dataset integration. Multiomics datasets are produced daily. Thus, the fusion of heterogeneous big data into intrinsic structures is an urgent problem. Novel mathematical methods are needed to process these data in a straightforward way.
We developed a novel method for subgrouping patients with distinct survival rates via the integration of multiple omics datasets and by using principal component analysis to reduce the high data dimensionality. Then, we constructed similarity graphs for patients, merged the graphs in a subspace, and analyzed them on a Grassmann manifold. The proposed method could identify patient subgroups that had not been reported previously by selecting the most critical information during the merging at each level of the omics dataset. Our method was tested on empirical multiomics datasets from The Cancer Genome Atlas.
Through the integration of microRNA, gene expression, and DNA methylation data, our method accurately identified patient subgroups and achieved superior performance compared with popular methods.
亚组患者对于理解疾病和通过整合多个组学数据集提供精确的个体化治疗非常重要。每天都会产生多组学数据集。因此,将异构大数据融合到内在结构中是一个紧迫的问题。需要新颖的数学方法来直接处理这些数据。
我们开发了一种通过整合多个组学数据集和使用主成分分析来降低高数据维数的方法,对具有不同生存率的患者进行亚组划分。然后,我们为患者构建相似性图,在子空间中合并这些图,并在 Grassmann 流形上对其进行分析。通过在组学数据集的每个层次的合并过程中选择最关键的信息,该方法可以识别以前未报道过的患者亚组。我们的方法在来自癌症基因组图谱的经验多组学数据上进行了测试。
通过整合 microRNA、基因表达和 DNA 甲基化数据,我们的方法可以准确地识别患者亚组,并与流行方法相比取得了优异的性能。