• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在多组学数据中结合多种数据类型是否会提高或降低生存预测的性能?来自大规模基准研究的见解。

Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study.

机构信息

Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany.

Laboratory for Leukemia Diagnostics, Department of Medicine III, LMU University Hospital, LMU Munich, Munich, Germany.

出版信息

BMC Med Inform Decis Mak. 2024 Sep 2;24(1):244. doi: 10.1186/s12911-024-02642-9.

DOI:10.1186/s12911-024-02642-9
PMID:39223659
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11370316/
Abstract

BACKGROUND

Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions.

METHODS

In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell's C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives.

RESULTS

Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures.

CONCLUSIONS

Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.

摘要

背景

基于多组学数据的预测建模,将同一患者的多种组学数据整合在一起,已显示出优于单组学预测建模的潜力。该领域的大多数研究都集中在整合大量数据类型上,尽管获取这些数据类型的过程复杂且昂贵。主流观点认为,增加数据类型的数量必然会提高预测性能。然而,整合信息量较少或冗余的数据类型可能会对性能产生影响。因此,确定最有效的组学数据类型组合以提高预测性能对于成本效益和准确预测至关重要。

方法

在这项研究中,我们使用来自 TCGA 数据库的 14 个具有右删失生存结局的癌症数据集,系统地评估了所有 31 种可能组合的预测性能,这些组合至少包含五种基因组数据类型(mRNA、miRNA、甲基化、DNAseq 和拷贝数变异)中的一种。我们采用了各种预测方法,并在每个模型中对临床数据进行加权,以利用其预测重要性。哈雷尔 C 指数和综合布赖尔评分被用作性能指标。为了评估我们发现的稳健性,我们在包含数据集的层面上进行了 bootstrap 分析。对关键结果进行了统计检验,通过限制测试数量,以确保假阳性的风险较低。

结果

与预期相反,我们发现对于大多数癌症类型,仅使用 mRNA 数据或 mRNA 和 miRNA 数据的组合就足以满足要求。对于某些癌症类型,额外纳入甲基化数据可提高预测结果。引入更多的数据类型往往会导致性能下降,这与两种性能指标的结果一致,远非提高性能。

结论

我们的研究结果对多组学生存预测中整合多种组学数据类型可提高预测性能的观点提出了挑战。因此,多组学预测中广泛采用的尽可能纳入更多数据类型的方法应重新考虑,以避免预测结果不佳和不必要的花费。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ef/11370316/535ab23b21d8/12911_2024_2642_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ef/11370316/23120a239172/12911_2024_2642_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ef/11370316/535ab23b21d8/12911_2024_2642_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ef/11370316/23120a239172/12911_2024_2642_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1ef/11370316/535ab23b21d8/12911_2024_2642_Figb_HTML.jpg

相似文献

1
Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study.在多组学数据中结合多种数据类型是否会提高或降低生存预测的性能?来自大规模基准研究的见解。
BMC Med Inform Decis Mak. 2024 Sep 2;24(1):244. doi: 10.1186/s12911-024-02642-9.
2
Large-scale benchmark study of survival prediction methods using multi-omics data.大规模基于多组学数据的生存预测方法基准研究。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa167.
3
Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction.泛癌种评估基因表达和体细胞改变数据以预测癌症预后。
BMC Cancer. 2021 Sep 25;21(1):1053. doi: 10.1186/s12885-021-08796-3.
4
Improving prediction performance of colon cancer prognosis based on the integration of clinical and multi-omics data.基于临床和多组学数据整合提高结肠癌预后预测性能。
BMC Med Inform Decis Mak. 2020 Feb 7;20(1):22. doi: 10.1186/s12911-020-1043-1.
5
Benchmark study of feature selection strategies for multi-omics data.基于多组学数据的特征选择策略基准研究。
BMC Bioinformatics. 2022 Oct 5;23(1):412. doi: 10.1186/s12859-022-04962-x.
6
A deep learning approach based on multi-omics data integration to construct a risk stratification prediction model for skin cutaneous melanoma.基于多组学数据整合的深度学习方法构建皮肤黑色素瘤风险分层预测模型。
J Cancer Res Clin Oncol. 2023 Nov;149(17):15923-15938. doi: 10.1007/s00432-023-05358-x. Epub 2023 Sep 7.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Benchmarking omics-based prediction of asthma development in children.基于组学的儿童哮喘发展预测的基准测试。
Respir Res. 2023 Feb 26;24(1):63. doi: 10.1186/s12931-023-02368-8.
9
Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer.基于乳腺癌元维度组学数据间的相互作用预测删失生存数据。
J Biomed Inform. 2015 Aug;56:220-8. doi: 10.1016/j.jbi.2015.05.019. Epub 2015 Jun 3.
10
Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis.基于深度学习的多组学生物标志物数据特征层融合在乳腺癌患者生存分析中的应用。
BMC Med Inform Decis Mak. 2020 Sep 15;20(1):225. doi: 10.1186/s12911-020-01225-8.

引用本文的文献

1
fuseMLR: an R package for integrative prediction modeling of multi-omics data.fuseMLR:一个用于多组学数据综合预测建模的R包。
BMC Bioinformatics. 2025 Aug 26;26(1):221. doi: 10.1186/s12859-025-06248-4.

本文引用的文献

1
Tutorial on survival modeling with applications to omics data.生存分析建模教程及其在组学数据中的应用。
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae132.
2
The Molecular Twin artificial-intelligence platform integrates multi-omic data to predict outcomes for pancreatic adenocarcinoma patients.分子双生子人工智能平台整合多组学数据,预测胰腺导管腺癌患者的结局。
Nat Cancer. 2024 Feb;5(2):299-314. doi: 10.1038/s43018-023-00697-7. Epub 2024 Jan 22.
3
Discovery of sparse, reliable omic biomarkers with Stabl.利用 Stabl 发现稀疏、可靠的组学生物标志物
Nat Biotechnol. 2024 Oct;42(10):1581-1593. doi: 10.1038/s41587-023-02033-x. Epub 2024 Jan 2.
4
UNMF: a unified nonnegative matrix factorization for multi-dimensional omics data.UNMF:一种用于多维组学数据的统一非负矩阵分解方法。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad253.
5
Hierarchical multi-omics data integration and modeling predict cell-specific chemical proteomics and drug responses.层次化多组学数据整合和建模预测细胞特异性化学蛋白质组学和药物反应。
Cell Rep Methods. 2023 Apr 17;3(4):100452. doi: 10.1016/j.crmeth.2023.100452. eCollection 2023 Apr 24.
6
Systematic comparison of multi-omics survival models reveals a widespread lack of noise resistance.系统比较多组学生存模型揭示广泛缺乏抗噪性。
Cell Rep Methods. 2023 Apr 24;3(4):100461. doi: 10.1016/j.crmeth.2023.100461.
7
Gene Expression Profiles in Cancers and Their Therapeutic Implications.癌症中的基因表达谱及其治疗意义。
Cancer J. 2023;29(1):9-14. doi: 10.1097/PPO.0000000000000638.
8
Benchmark study of feature selection strategies for multi-omics data.基于多组学数据的特征选择策略基准研究。
BMC Bioinformatics. 2022 Oct 5;23(1):412. doi: 10.1186/s12859-022-04962-x.
9
Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures.避免在使用判别指标评估生存分布预测时的 C-黑客攻击。
Bioinformatics. 2022 Sep 2;38(17):4178-4184. doi: 10.1093/bioinformatics/btac451.
10
Synergistic Effects of Different Levels of Genomic Data for the Staging of Lung Adenocarcinoma: An Illustrative Study.不同层次基因组数据对肺腺癌分期的协同作用:一项说明性研究。
Genes (Basel). 2021 Nov 24;12(12):1872. doi: 10.3390/genes12121872.