School of Computer Science and Technology, Xidian University, Xi'an, China.
Department of Computer Science, The University of British Columbia Okanagan, Kelowna, British Columbia, Canada.
PLoS Comput Biol. 2021 Aug 12;17(8):e1009224. doi: 10.1371/journal.pcbi.1009224. eCollection 2021 Aug.
Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis.
计算综合分析已成为生物问题数据驱动探索的重要方法。已经提出了许多癌症亚型整合方法,但由于缺乏金标准,评估这些方法变得很复杂。此外,在选择合适的数据类型和组合对综合研究性能的影响方面,仍存在一些实际重要的问题需要解决。在这里,我们通过考虑四种多组学数据类型的所有 11 种组合,构建了 TCGA 中 9 种癌症的三类基准数据集。使用这些数据集,我们根据聚类准确性和临床意义、稳健性和计算效率的综合准确性,对 10 种代表性的癌症亚型综合方法进行了全面评估。随后,我们研究了不同的组学数据对癌症亚型的影响,以及它们的组合的有效性。我们的分析反驳了广泛持有的直觉,即纳入更多类型的组学数据总是会产生更好的结果,表明在某些情况下,整合更多的组学数据会对整合方法的性能产生负面影响。我们的分析还为我们研究中的大多数癌症提出了一些有效的组合,这可能对组学数据分析的研究人员特别感兴趣。