Suppr超能文献

在多组学数据中结合多种数据类型是否会提高或降低生存预测的性能?来自大规模基准研究的见解。

Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study.

机构信息

Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany.

Laboratory for Leukemia Diagnostics, Department of Medicine III, LMU University Hospital, LMU Munich, Munich, Germany.

出版信息

BMC Med Inform Decis Mak. 2024 Sep 2;24(1):244. doi: 10.1186/s12911-024-02642-9.

Abstract

BACKGROUND

Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions.

METHODS

In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell's C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives.

RESULTS

Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures.

CONCLUSIONS

Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.

摘要

背景

基于多组学数据的预测建模,将同一患者的多种组学数据整合在一起,已显示出优于单组学预测建模的潜力。该领域的大多数研究都集中在整合大量数据类型上,尽管获取这些数据类型的过程复杂且昂贵。主流观点认为,增加数据类型的数量必然会提高预测性能。然而,整合信息量较少或冗余的数据类型可能会对性能产生影响。因此,确定最有效的组学数据类型组合以提高预测性能对于成本效益和准确预测至关重要。

方法

在这项研究中,我们使用来自 TCGA 数据库的 14 个具有右删失生存结局的癌症数据集,系统地评估了所有 31 种可能组合的预测性能,这些组合至少包含五种基因组数据类型(mRNA、miRNA、甲基化、DNAseq 和拷贝数变异)中的一种。我们采用了各种预测方法,并在每个模型中对临床数据进行加权,以利用其预测重要性。哈雷尔 C 指数和综合布赖尔评分被用作性能指标。为了评估我们发现的稳健性,我们在包含数据集的层面上进行了 bootstrap 分析。对关键结果进行了统计检验,通过限制测试数量,以确保假阳性的风险较低。

结果

与预期相反,我们发现对于大多数癌症类型,仅使用 mRNA 数据或 mRNA 和 miRNA 数据的组合就足以满足要求。对于某些癌症类型,额外纳入甲基化数据可提高预测结果。引入更多的数据类型往往会导致性能下降,这与两种性能指标的结果一致,远非提高性能。

结论

我们的研究结果对多组学生存预测中整合多种组学数据类型可提高预测性能的观点提出了挑战。因此,多组学预测中广泛采用的尽可能纳入更多数据类型的方法应重新考虑,以避免预测结果不佳和不必要的花费。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ef/11370316/23120a239172/12911_2024_2642_Figa_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验