文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

关于多组学整合以辅助大规模TCGA癌症数据集研究设计的综述。

A review on multi-omics integration for aiding study design of large scale TCGA cancer datasets.

作者信息

Han Eonyong, Kwon Hwijun, Jung Inuk

机构信息

School of Computer Science and Engineering, Kyungpook National University, Buk-gu, Daegu, 41566, Republic of Korea.

出版信息

BMC Genomics. 2025 Aug 22;26(1):769. doi: 10.1186/s12864-025-11925-y.


DOI:10.1186/s12864-025-11925-y
PMID:40847282
Abstract

BACKGROUND: Rapid advancements in high-throughput sequencing technologies allow for detailed and accurate measurement of omics features within their biological context. The integration of different omics types creates heterogeneous datasets, presenting challenges in analysis due to variations in measurement units, sample numbers, and features. Currently, there is a lack of generalized guidelines for making decisions in multi-omics study design (MOSD), such as selecting an appropriate number of samples and features, type of preprocessing and integration for robust analysis results. We propose a suggestive guideline for MOSD, involving nine important factors: sample size, feature selection, preprocessing strategy, noise characterization, class balance, number of classes, cancer subtype combination, omics combination, and clinical features. RESULTS: To assess the effectiveness of our proposed MOSD guidelines, we designed and conducted seven benchmark tests using 10 clustering methods on various TCGA cancer datasets with an objective of clustering cancer subtypes. The results indicated robust performance in terms of cancer subtype discrimination when adhering to the following criteria: 26 or more samples per class, selecting less than 10% of omics features, maintaining a sample balance under a 3:1 ratio, and keeping the noise level below 30%. Feature selection was particularly important, improving clustering performance by 34%. CONCLUSION: These findings provide evidence-based recommendations for MOSD, enabling researchers to optimize analytical approaches and enhance the reliability of results across cancer datasets. The proposed MOSD framework offers a suggestive guideline addressing both computational and biological factors for multi-omics data integration.

摘要

背景:高通量测序技术的快速发展使得在生物学背景下能够详细且准确地测量组学特征。不同组学类型的整合产生了异质数据集,由于测量单位、样本数量和特征的差异,在分析中面临挑战。目前,在多组学研究设计(MOSD)中缺乏用于决策的通用指南,例如选择合适的样本数量和特征、预处理和整合的类型以获得稳健的分析结果。我们提出了一个MOSD的建议指南,涉及九个重要因素:样本量、特征选择、预处理策略、噪声特征、类平衡、类别数量、癌症亚型组合、组学组合和临床特征。 结果:为了评估我们提出的MOSD指南的有效性,我们设计并进行了七项基准测试,使用10种聚类方法对各种TCGA癌症数据集进行测试,目的是对癌症亚型进行聚类。结果表明,当遵循以下标准时,在癌症亚型区分方面具有稳健的性能:每类26个或更多样本,选择少于10%的组学特征,保持样本平衡在3:1的比例以下,并将噪声水平保持在30%以下。特征选择尤为重要,可将聚类性能提高34%。 结论:这些发现为MOSD提供了基于证据的建议,使研究人员能够优化分析方法并提高跨癌症数据集结果的可靠性。所提出的MOSD框架提供了一个建议指南,解决了多组学数据整合中的计算和生物学因素。

相似文献

[1]
A review on multi-omics integration for aiding study design of large scale TCGA cancer datasets.

BMC Genomics. 2025-8-22

[2]
Prescription of Controlled Substances: Benefits and Risks

2025-1

[3]
sCIN: a contrastive learning framework for single-cell multi-omics data integration.

Brief Bioinform. 2025-7-2

[4]
MO-GCAN: multi-omics integration based on graph convolutional and attention networks.

Bioinformatics. 2025-8-2

[5]
Audit and feedback: effects on professional practice.

Cochrane Database Syst Rev. 2025-3-25

[6]
Multi-omics single-cell data alignment and integration with enhanced contrastive learning and differential attention mechanism.

Bioinformatics. 2025-8-2

[7]
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021-4-19

[8]
Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping.

Brief Bioinform. 2024-9-23

[9]
MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.

Alzheimers Dement. 2021-4

[10]
Effective Integration of Single-Cell Multi-Omics Data Using Improved Network-Based Integrative Clustering with Multigraph Regularization.

J Comput Biol. 2025-6

本文引用的文献

[1]
MOPA: An integrative multi-omics pathway analysis method for measuring omics activity.

PLoS One. 2023

[2]
Differences in functioning between young adults with cancer and older age groups: A cross-sectional study.

Eur J Cancer Care (Engl). 2022-11

[3]
Gene Expression Analysis Reveals Age and Ethnicity Signatures Between Young and Old Adults in Human PBMC.

Front Aging. 2022-2-3

[4]
Multi-omic machine learning predictor of breast cancer therapy response.

Nature. 2022-1

[5]
Chromosome-Scale Genome Assemblies of Two Korean Cucumber Inbred Lines.

Front Genet. 2021-11-19

[6]
MONTI: A Multi-Omics Non-negative Tensor Decomposition Framework for Gene-Level Integrative Analysis.

Front Genet. 2021-9-10

[7]
Evaluation and comparison of multi-omics data integration methods for cancer subtyping.

PLoS Comput Biol. 2021-8

[8]
Integrating multi-omics data through deep learning for accurate cancer prognosis prediction.

Comput Biol Med. 2021-7

[9]
Using machine learning approaches for multi-omics data analysis: A review.

Biotechnol Adv. 2021

[10]
multiGSEA: a GSEA-based pathway enrichment analysis for multi-omics data.

BMC Bioinformatics. 2020-12-7

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索