• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于多组学整合以辅助大规模TCGA癌症数据集研究设计的综述。

A review on multi-omics integration for aiding study design of large scale TCGA cancer datasets.

作者信息

Han Eonyong, Kwon Hwijun, Jung Inuk

机构信息

School of Computer Science and Engineering, Kyungpook National University, Buk-gu, Daegu, 41566, Republic of Korea.

出版信息

BMC Genomics. 2025 Aug 22;26(1):769. doi: 10.1186/s12864-025-11925-y.

DOI:10.1186/s12864-025-11925-y
PMID:40847282
Abstract

BACKGROUND

Rapid advancements in high-throughput sequencing technologies allow for detailed and accurate measurement of omics features within their biological context. The integration of different omics types creates heterogeneous datasets, presenting challenges in analysis due to variations in measurement units, sample numbers, and features. Currently, there is a lack of generalized guidelines for making decisions in multi-omics study design (MOSD), such as selecting an appropriate number of samples and features, type of preprocessing and integration for robust analysis results. We propose a suggestive guideline for MOSD, involving nine important factors: sample size, feature selection, preprocessing strategy, noise characterization, class balance, number of classes, cancer subtype combination, omics combination, and clinical features.

RESULTS

To assess the effectiveness of our proposed MOSD guidelines, we designed and conducted seven benchmark tests using 10 clustering methods on various TCGA cancer datasets with an objective of clustering cancer subtypes. The results indicated robust performance in terms of cancer subtype discrimination when adhering to the following criteria: 26 or more samples per class, selecting less than 10% of omics features, maintaining a sample balance under a 3:1 ratio, and keeping the noise level below 30%. Feature selection was particularly important, improving clustering performance by 34%.

CONCLUSION

These findings provide evidence-based recommendations for MOSD, enabling researchers to optimize analytical approaches and enhance the reliability of results across cancer datasets. The proposed MOSD framework offers a suggestive guideline addressing both computational and biological factors for multi-omics data integration.

摘要

背景

高通量测序技术的快速发展使得在生物学背景下能够详细且准确地测量组学特征。不同组学类型的整合产生了异质数据集,由于测量单位、样本数量和特征的差异,在分析中面临挑战。目前,在多组学研究设计(MOSD)中缺乏用于决策的通用指南,例如选择合适的样本数量和特征、预处理和整合的类型以获得稳健的分析结果。我们提出了一个MOSD的建议指南,涉及九个重要因素:样本量、特征选择、预处理策略、噪声特征、类平衡、类别数量、癌症亚型组合、组学组合和临床特征。

结果

为了评估我们提出的MOSD指南的有效性,我们设计并进行了七项基准测试,使用10种聚类方法对各种TCGA癌症数据集进行测试,目的是对癌症亚型进行聚类。结果表明,当遵循以下标准时,在癌症亚型区分方面具有稳健的性能:每类26个或更多样本,选择少于10%的组学特征,保持样本平衡在3:1的比例以下,并将噪声水平保持在30%以下。特征选择尤为重要,可将聚类性能提高34%。

结论

这些发现为MOSD提供了基于证据的建议,使研究人员能够优化分析方法并提高跨癌症数据集结果的可靠性。所提出的MOSD框架提供了一个建议指南,解决了多组学数据整合中的计算和生物学因素。

相似文献

1
A review on multi-omics integration for aiding study design of large scale TCGA cancer datasets.关于多组学整合以辅助大规模TCGA癌症数据集研究设计的综述。
BMC Genomics. 2025 Aug 22;26(1):769. doi: 10.1186/s12864-025-11925-y.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
sCIN: a contrastive learning framework for single-cell multi-omics data integration.sCIN:用于单细胞多组学数据整合的对比学习框架。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf411.
4
MO-GCAN: multi-omics integration based on graph convolutional and attention networks.MO-GCAN:基于图卷积和注意力网络的多组学整合
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf405.
5
Audit and feedback: effects on professional practice.审核与反馈:对专业实践的影响
Cochrane Database Syst Rev. 2025 Mar 25;3(3):CD000259. doi: 10.1002/14651858.CD000259.pub4.
6
Multi-omics single-cell data alignment and integration with enhanced contrastive learning and differential attention mechanism.基于增强对比学习和差分注意力机制的多组学单细胞数据对齐与整合
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf443.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
8
Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping.新型多组学去混淆变分自动编码器可获得有意义的疾病亚型。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae512.
9
MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.马克 VCID 脑小血管联盟:一、入组、临床、液体方案。
Alzheimers Dement. 2021 Apr;17(4):704-715. doi: 10.1002/alz.12215. Epub 2021 Jan 21.
10
Effective Integration of Single-Cell Multi-Omics Data Using Improved Network-Based Integrative Clustering with Multigraph Regularization.使用改进的基于网络的多图正则化集成聚类实现单细胞多组学数据的有效整合。
J Comput Biol. 2025 Jun;32(6):601-614. doi: 10.1089/cmb.2023.0460. Epub 2025 May 22.

本文引用的文献

1
MOPA: An integrative multi-omics pathway analysis method for measuring omics activity.MOPA:一种综合多组学生物途径分析方法,用于测量组学生物活性。
PLoS One. 2023 Mar 16;18(3):e0278272. doi: 10.1371/journal.pone.0278272. eCollection 2023.
2
Differences in functioning between young adults with cancer and older age groups: A cross-sectional study.癌症青年患者与老年群体之间功能差异的比较:一项横断面研究。
Eur J Cancer Care (Engl). 2022 Nov;31(6):e13660. doi: 10.1111/ecc.13660. Epub 2022 Jul 17.
3
Gene Expression Analysis Reveals Age and Ethnicity Signatures Between Young and Old Adults in Human PBMC.
基因表达分析揭示了人类外周血单个核细胞中年轻人和老年人之间的年龄和种族特征。
Front Aging. 2022 Feb 3;2:797040. doi: 10.3389/fragi.2021.797040. eCollection 2021.
4
Multi-omic machine learning predictor of breast cancer therapy response.乳腺癌治疗反应的多组学机器学习预测器。
Nature. 2022 Jan;601(7894):623-629. doi: 10.1038/s41586-021-04278-5. Epub 2021 Dec 7.
5
Chromosome-Scale Genome Assemblies of Two Korean Cucumber Inbred Lines.两个韩国黄瓜自交系的染色体水平基因组组装
Front Genet. 2021 Nov 19;12:733188. doi: 10.3389/fgene.2021.733188. eCollection 2021.
6
MONTI: A Multi-Omics Non-negative Tensor Decomposition Framework for Gene-Level Integrative Analysis.MONTI:用于基因水平综合分析的多组学非负张量分解框架
Front Genet. 2021 Sep 10;12:682841. doi: 10.3389/fgene.2021.682841. eCollection 2021.
7
Evaluation and comparison of multi-omics data integration methods for cancer subtyping.癌症亚型的多组学数据整合方法的评估与比较。
PLoS Comput Biol. 2021 Aug 12;17(8):e1009224. doi: 10.1371/journal.pcbi.1009224. eCollection 2021 Aug.
8
Integrating multi-omics data through deep learning for accurate cancer prognosis prediction.通过深度学习整合多组学数据,实现癌症预后的精准预测。
Comput Biol Med. 2021 Jul;134:104481. doi: 10.1016/j.compbiomed.2021.104481. Epub 2021 May 9.
9
Using machine learning approaches for multi-omics data analysis: A review.使用机器学习方法进行多组学数据分析:综述
Biotechnol Adv. 2021 Jul-Aug;49:107739. doi: 10.1016/j.biotechadv.2021.107739. Epub 2021 Mar 29.
10
multiGSEA: a GSEA-based pathway enrichment analysis for multi-omics data.multiGSEA:一种基于 GSEA 的多组学数据通路富集分析方法。
BMC Bioinformatics. 2020 Dec 7;21(1):561. doi: 10.1186/s12859-020-03910-x.