• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多组学中基于块的缺失数据框架。

A framework for block-wise missing data in multi-omics.

机构信息

Departament of Genetics, Microbiology and Statistics, University of Barcelona, Barcelona Spain.

出版信息

PLoS One. 2024 Jul 23;19(7):e0307482. doi: 10.1371/journal.pone.0307482. eCollection 2024.

DOI:10.1371/journal.pone.0307482
PMID:39042603
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11265675/
Abstract

High-throughput technologies have generated vast amounts of omic data. It is a consensus that the integration of diverse omics sources improves predictive models and biomarker discovery. However, managing multiple omics data poses challenges such as data heterogeneity, noise, high-dimensionality and missing data, especially in block-wise patterns. This study addresses the challenges of high dimensionality and block-wise missing data through a regularization and constrained-based approach. The methodology is implemented in the R package bwm for binary and continuous response variables, and applied to breast cancer and exposome multi-omics datasets, achieving strong performance even in scenarios with missing data present in all omics. In binary classification task, our proposed model achieves accuracy in the range of 86% to 92%, and F1 in the range of 68% to 79%. And, in regression task the correlation between true and predicted responses is in the range of 72% to 76%. However, there is a slight decline in performance metrics as the percentage of missing data increases. In scenarios where block-wise missing data affects multiple omics, the model performance actually surpasses that of scenarios where missing data is present in only one omics. One possible explanation for this might be that the other scenarios introduce a greater diversity of observation profiles, leading to a more robust model. Depending on the specific omics being studied, there is greater consistency in feature selection when comparing block-wise missing data scenarios.

摘要

高通量技术产生了大量的组学数据。人们普遍认为,整合多种组学源可以提高预测模型和生物标志物发现的能力。然而,管理多个组学数据存在一些挑战,如数据异质性、噪声、高维性和缺失数据,特别是在块状模式下。本研究通过正则化和约束方法解决了高维性和块状缺失数据的挑战。该方法在 R 包 bwm 中实现,用于二进制和连续响应变量,并应用于乳腺癌和暴露组多组学数据集,即使在所有组学都存在缺失数据的情况下,也能取得良好的性能。在二进制分类任务中,我们提出的模型的准确率在 86%到 92%之间,F1 值在 68%到 79%之间。在回归任务中,真实响应和预测响应之间的相关性在 72%到 76%之间。然而,随着缺失数据百分比的增加,性能指标略有下降。在块状缺失数据影响多个组学的情况下,模型性能实际上超过了仅在一个组学中存在缺失数据的情况。一种可能的解释是,其他情况下引入了更多不同的观测剖面,从而使模型更健壮。根据具体的组学研究,在比较块状缺失数据场景时,特征选择的一致性更大。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/537da85e2e3d/pone.0307482.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/ba55f5d1f474/pone.0307482.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/1ce905f051b5/pone.0307482.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/3627a53ca012/pone.0307482.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/22c489deacd5/pone.0307482.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/695bdc512619/pone.0307482.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/07d915af3e1b/pone.0307482.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/bcb6182a97de/pone.0307482.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/537da85e2e3d/pone.0307482.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/ba55f5d1f474/pone.0307482.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/1ce905f051b5/pone.0307482.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/3627a53ca012/pone.0307482.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/22c489deacd5/pone.0307482.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/695bdc512619/pone.0307482.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/07d915af3e1b/pone.0307482.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/bcb6182a97de/pone.0307482.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/466f/11265675/537da85e2e3d/pone.0307482.g008.jpg

相似文献

1
A framework for block-wise missing data in multi-omics.多组学中基于块的缺失数据框架。
PLoS One. 2024 Jul 23;19(7):e0307482. doi: 10.1371/journal.pone.0307482. eCollection 2024.
2
ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets.ioSearch:一种使用新算法识别相互作用的多组学生物标志物的方法,并应用于乳腺癌数据集。
Genet Epidemiol. 2023 Dec;47(8):600-616. doi: 10.1002/gepi.22536. Epub 2023 Oct 5.
3
A denoised multi-omics integration framework for cancer subtype classification and survival prediction.一种用于癌症亚型分类和生存预测的去噪多组学整合框架。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad304.
4
AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification.AVBAE-MODFR:一种基于多组学数据的嵌入和特征选择的深度学习框架,用于泛癌分类。
Comput Biol Med. 2024 Jul;177:108614. doi: 10.1016/j.compbiomed.2024.108614. Epub 2024 May 14.
5
Multi-omics regulatory network inference in the presence of missing data.存在缺失数据时的多组学调控网络推断。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad309.
6
moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks.moBRCA-net:一种基于多组学注意力神经网络的乳腺癌亚型分类框架。
BMC Bioinformatics. 2023 Apr 26;24(1):169. doi: 10.1186/s12859-023-05273-5.
7
NetMIM: network-based multi-omics integration with block missingness for biomarker selection and disease outcome prediction.NetMIM:基于网络的多组学整合,具有块缺失,用于生物标志物选择和疾病结果预测。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae454.
8
A supervised Bayesian factor model for the identification of multi-omics signatures.基于监督贝叶斯因子模型的多组学特征识别。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae202.
9
Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data.利用多模态和组学机器学习集成(MOMLIN)推进药物反应预测:乳腺癌临床数据案例研究。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae300.
10
Adaptive Sparse Multi-Block PLS Discriminant Analysis: An Integrative Method for Identifying Key Biomarkers from Multi-Omics Data.自适应稀疏多块偏最小二乘判别分析:一种从多组学数据中识别关键生物标志物的综合方法。
Genes (Basel). 2023 Apr 23;14(5):961. doi: 10.3390/genes14050961.

引用本文的文献

1
Predictive analytics in bronchopulmonary dysplasia: past, present, and future.支气管肺发育不良的预测分析:过去、现在与未来。
Front Pediatr. 2024 Nov 20;12:1483940. doi: 10.3389/fped.2024.1483940. eCollection 2024.

本文引用的文献

1
PaCMAP-embedded convolutional neural network for multi-omics data integration.用于多组学数据整合的嵌入PaCMAP的卷积神经网络。
Heliyon. 2023 Dec 5;10(1):e23195. doi: 10.1016/j.heliyon.2023.e23195. eCollection 2024 Jan 15.
2
State-of-the-art methods for exposure-health studies: Results from the exposome data challenge event.暴露组学研究的最新方法:来自暴露组数据挑战事件的结果。
Environ Int. 2022 Oct;168:107422. doi: 10.1016/j.envint.2022.107422. Epub 2022 Aug 27.
3
Classification of Breast Cancer Nottingham Prognostic Index Using High-Dimensional Embedding and Residual Neural Network.
使用高维嵌入和残差神经网络对乳腺癌诺丁汉预后指数进行分类
Cancers (Basel). 2022 Feb 13;14(4):934. doi: 10.3390/cancers14040934.
4
Integration strategies of multi-omics data for machine learning analysis.用于机器学习分析的多组学数据整合策略。
Comput Struct Biotechnol J. 2021 Jun 22;19:3735-3746. doi: 10.1016/j.csbj.2021.06.030. eCollection 2021.
5
A Review of Integrative Imputation for Multi-Omics Datasets.多组学数据集的整合插补综述
Front Genet. 2020 Oct 15;11:570255. doi: 10.3389/fgene.2020.570255. eCollection 2020.
6
Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling.整合网络融合:分子谱分析中的多组学方法
Front Oncol. 2020 Jun 30;10:1065. doi: 10.3389/fonc.2020.01065. eCollection 2020.
7
Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans.人类基因组学。基因型-组织表达(GTEx)试点分析:人类多组织基因调控
Science. 2015 May 8;348(6235):648-60. doi: 10.1126/science.1262110. Epub 2015 May 7.
8
The Cancer Genome Atlas Pan-Cancer analysis project.癌症基因组图谱泛癌分析项目。
Nat Genet. 2013 Oct;45(10):1113-20. doi: 10.1038/ng.2764.
9
Bi-level multi-source learning for heterogeneous block-wise missing data.用于异质分块缺失数据的双层多源学习。
Neuroimage. 2014 Nov 15;102 Pt 1:192-206. doi: 10.1016/j.neuroimage.2013.08.015. Epub 2013 Aug 27.