• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MLOmics:用于机器学习的癌症多组学数据库。

MLOmics: Cancer Multi-Omics Database for Machine Learning.

作者信息

Yang Ziwei, Kotoge Rikuto, Piao Xihao, Chen Zheng, Zhu Lingwei, Gao Peng, Matsubara Yasuko, Sakurai Yasushi, Sun Jimeng

机构信息

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan.

SANKEN, Osaka University, Osaka, Japan.

出版信息

Sci Data. 2025 May 30;12(1):913. doi: 10.1038/s41597-025-05235-x.

DOI:10.1038/s41597-025-05235-x
PMID:40447627
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12125382/
Abstract

Framing the investigation of diverse cancers as a machine learning problem has recently shown significant potential in multi-omics analysis and cancer research. Empowering these successful machine learning models are the high-quality training datasets with sufficient data volume and adequate preprocessing. However, while there exist several public data portals, including The Cancer Genome Atlas (TCGA) multi-omics initiative or open-bases such as the LinkedOmics, these databases are not off-the-shelf for existing machine learning models. In this paper, we introduce MLOmics, an open cancer multi-omics database aiming at serving better the development and evaluation of bioinformatics and machine learning models. MLOmics contains 8,314 patient samples covering all 32 cancer types with four omics types, stratified features, and extensive baselines. Complementary support for downstream analysis and bio-knowledge linking are also included to support interdisciplinary analysis.

摘要

将各种癌症的研究构建为一个机器学习问题,最近在多组学分析和癌症研究中显示出巨大潜力。这些成功的机器学习模型的强大助力在于拥有足够数据量和充分预处理的高质量训练数据集。然而,尽管存在几个公共数据门户,包括癌症基因组图谱(TCGA)多组学计划或如LinkedOmics这样的开放库,但这些数据库对于现有的机器学习模型并非现成可用。在本文中,我们介绍了MLOmics,这是一个开放的癌症多组学数据库,旨在更好地服务于生物信息学和机器学习模型的开发与评估。MLOmics包含8314个患者样本,涵盖所有32种癌症类型,具有四种组学类型、分层特征和广泛的基线。还包括对下游分析和生物知识链接的补充支持,以支持跨学科分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aed/12125382/649c2e989dc3/41597_2025_5235_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aed/12125382/d9f850003498/41597_2025_5235_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aed/12125382/4abd022a87d5/41597_2025_5235_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aed/12125382/9a3085f8be28/41597_2025_5235_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aed/12125382/649c2e989dc3/41597_2025_5235_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aed/12125382/d9f850003498/41597_2025_5235_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aed/12125382/4abd022a87d5/41597_2025_5235_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aed/12125382/9a3085f8be28/41597_2025_5235_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aed/12125382/649c2e989dc3/41597_2025_5235_Fig4_HTML.jpg

相似文献

1
MLOmics: Cancer Multi-Omics Database for Machine Learning.MLOmics:用于机器学习的癌症多组学数据库。
Sci Data. 2025 May 30;12(1):913. doi: 10.1038/s41597-025-05235-x.
2
MMOSurv: meta-learning for few-shot survival analysis with multi-omics data.MMOSurv:利用多组学数据进行少样本生存分析的元学习
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae684.
3
Federated transfer learning with differential privacy for multi-omics survival analysis.用于多组学生存分析的具有差分隐私的联邦迁移学习
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf166.
4
Inferring tumor purity using multi-omics data based on a uniform machine learning framework MoTP.基于统一机器学习框架MoTP使用多组学数据推断肿瘤纯度。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf056.
5
Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE).使用多视图因子分解自动编码器(MAE)将多组学数据与生物相互作用网络集成。
BMC Genomics. 2019 Dec 20;20(Suppl 11):944. doi: 10.1186/s12864-019-6285-x.
6
A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology.多组学数据整合的机器学习技术综合综述:精准肿瘤学中的挑战与应用
Brief Funct Genomics. 2024 Sep 27;23(5):549-560. doi: 10.1093/bfgp/elae013.
7
TMO-Net: an explainable pretrained multi-omics model for multi-task learning in oncology.TMO-Net:一种用于肿瘤学多任务学习的可解释的预训练多组学模型。
Genome Biol. 2024 Jun 6;25(1):149. doi: 10.1186/s13059-024-03293-9.
8
DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data.推断:基于多组学数据的多头注意力解耦对比学习发现癌症亚型。
Comput Methods Programs Biomed. 2024 Dec;257:108478. doi: 10.1016/j.cmpb.2024.108478. Epub 2024 Oct 30.
9
Multi-omics characterization and machine learning of lung adenocarcinoma molecular subtypes to guide precise chemotherapy and immunotherapy.肺腺癌分子亚型的多组学特征分析及机器学习以指导精准化疗和免疫治疗
Front Immunol. 2024 Nov 28;15:1497300. doi: 10.3389/fimmu.2024.1497300. eCollection 2024.
10
Self-omics: A Self-supervised Learning Framework for Multi-omics Cancer Data.自组学:一种用于多组学生物标志物癌症数据的自监督学习框架。
Pac Symp Biocomput. 2023;28:263-274.

本文引用的文献

1
Current and future directions in network biology.网络生物学的当前与未来发展方向。
Bioinform Adv. 2024 Aug 14;4(1):vbae099. doi: 10.1093/bioadv/vbae099. eCollection 2024.
2
ReGeNNe: genetic pathway-based deep neural network using canonical correlation regularizer for disease prediction.ReGeNNe:基于遗传途径的深度神经网络,使用正则相关正则化器进行疾病预测。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad679.
3
CustOmics: A versatile deep-learning based strategy for multi-omics integration.组学:一种基于深度学习的多功能多组学整合策略。
PLoS Comput Biol. 2023 Mar 6;19(3):e1010921. doi: 10.1371/journal.pcbi.1010921. eCollection 2023 Mar.
4
The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest.2023 年的 STRING 数据库:针对任何感兴趣的测序基因组的蛋白质-蛋白质关联网络和功能富集分析。
Nucleic Acids Res. 2023 Jan 6;51(D1):D638-D646. doi: 10.1093/nar/gkac1000.
5
MCluster-VAEs: An end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data.MCluster-VAEs:一种基于变分深度学习的端到端聚类方法,用于利用多组学数据进行亚型发现。
Comput Biol Med. 2022 Nov;150:106085. doi: 10.1016/j.compbiomed.2022.106085. Epub 2022 Sep 6.
6
XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data.XOmiVAE:一种使用高维组学数据进行癌症分类的可解释深度学习模型。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab315.
7
Integration strategies of multi-omics data for machine learning analysis.用于机器学习分析的多组学数据整合策略。
Comput Struct Biotechnol J. 2021 Jun 22;19:3735-3746. doi: 10.1016/j.csbj.2021.06.030. eCollection 2021.
8
Integrating multi-omics data through deep learning for accurate cancer prognosis prediction.通过深度学习整合多组学数据,实现癌症预后的精准预测。
Comput Biol Med. 2021 Jul;134:104481. doi: 10.1016/j.compbiomed.2021.104481. Epub 2021 May 9.
9
Using machine learning approaches for multi-omics data analysis: A review.使用机器学习方法进行多组学数据分析:综述
Biotechnol Adv. 2021 Jul-Aug;49:107739. doi: 10.1016/j.biotechadv.2021.107739. Epub 2021 Mar 29.
10
Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data.亚型生成对抗网络(Subtype-GAN):一种用于多组学数据综合癌症亚型分析的深度学习方法。
Bioinformatics. 2021 Aug 25;37(16):2231-2237. doi: 10.1093/bioinformatics/btab109.