Amaro Adriana, Pfeffer Max, Pfeffer Ulrich, Reggiani Francesco
IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy.
Faculty of Mathematics, Technical University of Chemnitz, 09111 Chemnitz, Germany.
Biomedicines. 2022 Dec 13;10(12):3240. doi: 10.3390/biomedicines10123240.
There is a growing number of multi-domain genomic datasets for human tumors. Multi-domain data are usually interpreted after separately analyzing single-domain data and integrating the results post hoc. Data fusion techniques allow for the real integration of multi-domain data to ideally improve the tumor classification results for the prognosis and prediction of response to therapy. We have previously described the joint singular value decomposition (jSVD) technique as a means of data fusion. Here, we report on the development of these methods in open source code based on R and Python and on the application of these data fusion methods. The Cancer Genome Atlas (TCGA) Skin Cutaneous Melanoma (SKCM) dataset was used as a benchmark to evaluate the potential of the data fusion approaches to improve molecular classification of cancers in a clinically relevant manner. Our data show that the data fusion approach does not generate classification results superior to those obtained using single-domain data. Data from different domains are not entirely independent from each other, and molecular classes are characterized by features that penetrate different domains. Data fusion techniques might be better suited for response prediction, where they could contribute to the identification of predictive features in a domain-independent manner to be used as biomarkers.
用于人类肿瘤的多领域基因组数据集的数量正在不断增加。多领域数据通常在分别分析单领域数据并事后整合结果后进行解读。数据融合技术允许对多领域数据进行真正的整合,从而理想地改善肿瘤分类结果,以用于预后评估和治疗反应预测。我们之前曾将联合奇异值分解(jSVD)技术描述为一种数据融合方法。在此,我们报告基于R和Python的开源代码中这些方法的开发情况以及这些数据融合方法的应用。癌症基因组图谱(TCGA)皮肤黑色素瘤(SKCM)数据集被用作基准,以评估数据融合方法以临床相关方式改善癌症分子分类的潜力。我们的数据表明,数据融合方法所产生的分类结果并不优于使用单领域数据所获得的结果。来自不同领域的数据并非完全相互独立,而且分子类别具有贯穿不同领域的特征。数据融合技术可能更适合用于反应预测,在反应预测中,它们可以以领域独立的方式有助于识别用作生物标志物的预测特征。