• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过稀疏典型相关分析整合多组学数据以预测复杂性状:一项比较研究。

Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study.

机构信息

Department of Mathematics, Imperial College London, London SW7 2AZ, UK.

出版信息

Bioinformatics. 2020 Nov 1;36(17):4616-4625. doi: 10.1093/bioinformatics/btaa530.

DOI:10.1093/bioinformatics/btaa530
PMID:32437529
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7750936/
Abstract

MOTIVATION

Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p≫n) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets.

RESULTS

Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets.

AVAILABILITY AND IMPLEMENTATION

https://github.com/theorod93/sCCA.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

最近技术的发展使研究人员能够为同一批个体收集多个 OMICS 数据集。从其余数据集中分别分析每个 OMIC 数据集或测试 OMICS 数据集之间的关联,是理解收集的数据集与感兴趣的复杂特征之间关系的传统方法。在这项工作中,我们表明,将多个 OMICS 数据集整合在一起,而不是分别分析它们,可以提高我们对它们之间关系的理解,以及对测试特征的预测准确性。已经提出了几种用于整合异构和高维(p≫n)数据(例如 OMICS)的方法。正则化典型相关分析(CCA)方法的稀疏变体是一种很有前途的方法,它试图通过在数据集之间实现最大相关性的同时,对典型变量进行惩罚以产生稀疏潜在变量。在过去的几年中,已经提出了许多用于实现稀疏 CCA(sCCA)的方法,它们在目标函数、获得稀疏潜在变量的迭代算法以及对原始数据集的不同假设方面有所不同。

结果

通过比较研究,我们探讨了 Parkhomenko 等人提出的常规 CCA、Witten 和 Tibshirani 提出的惩罚矩阵分解 CCA 及其 Suo 等人提出的扩展方法的性能。上述方法被修改为允许使用不同的惩罚函数。虽然 sCCA 是一种用于理解中间关系的无监督学习方法,但我们将问题扭曲为监督学习问题,并研究了计算出的潜在变量如何用于预测复杂特征。这些方法被扩展到允许有多个(超过两个)数据集,其中特征被包括在输入数据集中之一。这两种方法都比包括一个或多个数据集的常规预测模型有了改进。

可用性和实现

https://github.com/theorod93/sCCA。

补充信息

补充数据可在 Bioinformatics 在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/4f029e7c3490/btaa530f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/66e2a943229a/btaa530f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/573103308e3a/btaa530f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/5a827f691906/btaa530f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/d30bd08c6426/btaa530f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/4f029e7c3490/btaa530f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/66e2a943229a/btaa530f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/573103308e3a/btaa530f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/5a827f691906/btaa530f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/d30bd08c6426/btaa530f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55b7/7750936/4f029e7c3490/btaa530f5.jpg

相似文献

1
Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study.通过稀疏典型相关分析整合多组学数据以预测复杂性状:一项比较研究。
Bioinformatics. 2020 Nov 1;36(17):4616-4625. doi: 10.1093/bioinformatics/btaa530.
2
Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration.用于多组学数据整合的13种无监督方法的聚类和变量选择评估
Brief Bioinform. 2020 Dec 1;21(6):2011-2030. doi: 10.1093/bib/bbz138.
3
An iterative penalized least squares approach to sparse canonical correlation analysis.一种用于稀疏典型相关分析的迭代惩罚最小二乘法。
Biometrics. 2019 Sep;75(3):734-744. doi: 10.1111/biom.13043. Epub 2019 Apr 9.
4
Canonical correlation analysis for multi-omics: Application to cross-cohort analysis.多组学的典范相关分析:在跨队列分析中的应用。
PLoS Genet. 2023 May 22;19(5):e1010517. doi: 10.1371/journal.pgen.1010517. eCollection 2023 May.
5
Group sparse canonical correlation analysis for genomic data integration.基于组稀疏典型相关分析的基因组数据整合。
BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245.
6
Unsupervised discovery of phenotype-specific multi-omics networks.无监督发现表型特异性多组学网络。
Bioinformatics. 2019 Nov 1;35(21):4336-4343. doi: 10.1093/bioinformatics/btz226.
7
NEMO: cancer subtyping by integration of partial multi-omic data.NEMO:通过整合部分多组学数据进行癌症亚型分类。
Bioinformatics. 2019 Sep 15;35(18):3348-3356. doi: 10.1093/bioinformatics/btz058.
8
Identifying diagnosis-specific genotype-phenotype associations via joint multitask sparse canonical correlation analysis and classification.通过联合多任务稀疏典型相关分析和分类识别诊断特异性基因型-表型关联。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i371-i379. doi: 10.1093/bioinformatics/btaa434.
9
Resistant multiple sparse canonical correlation.抗性多重稀疏典型相关
Stat Appl Genet Mol Biol. 2016 Apr;15(2):123-38. doi: 10.1515/sagmb-2014-0081.
10
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

引用本文的文献

1
Trustworthy causal biomarker discovery: a multiomics brain imaging genetics-based approach.可靠的因果生物标志物发现:一种基于多组学脑成像遗传学的方法。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i227-i236. doi: 10.1093/bioinformatics/btaf257.
2
Mutual-assistance learning for trustworthy biomarker discovery and disease prediction.用于可靠生物标志物发现和疾病预测的互助学习。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf178.
3
Multi-omics integration identifies molecular markers and biological pathways for carcass and meat quality traits in Nellore cattle.

本文引用的文献

1
Designing and interpreting 'multi-omic' experiments that may change our understanding of biology.设计并解读可能改变我们对生物学理解的“多组学”实验。
Curr Opin Syst Biol. 2017 Dec;6:37-45. doi: 10.1016/j.coisb.2017.08.009.
2
Multi-omics Data Integration, Interpretation, and Its Application.多组学数据整合、解读及其应用
Bioinform Biol Insights. 2020 Jan 31;14:1177932219899051. doi: 10.1177/1177932219899051. eCollection 2020.
3
A comparative study of multi-omics integration tools for cancer driver gene identification and tumour subtyping.
多组学整合鉴定了内洛尔牛胴体和肉质性状的分子标记及生物学途径。
Sci Rep. 2025 Mar 26;15(1):10467. doi: 10.1038/s41598-025-93714-x.
4
Challenges in AI-driven Biomedical Multimodal Data Fusion and Analysis.人工智能驱动的生物医学多模态数据融合与分析中的挑战。
Genomics Proteomics Bioinformatics. 2025 May 10;23(1). doi: 10.1093/gpbjnl/qzaf011.
5
Multimodal data integration in early-stage breast cancer.早期乳腺癌的多模态数据整合
Breast. 2025 Apr;80:103892. doi: 10.1016/j.breast.2025.103892. Epub 2025 Jan 28.
6
Stable biomarker discovery in multi-omics data via canonical correlation analysis.基于典型相关分析的多组学数据中稳定生物标志物的发现。
PLoS One. 2024 Sep 9;19(9):e0309921. doi: 10.1371/journal.pone.0309921. eCollection 2024.
7
Smccnet 2.0: a comprehensive tool for multi-omics network inference with shiny visualization.Smccnet 2.0:一个具有 shiny 可视化功能的用于多组学网络推断的综合工具。
BMC Bioinformatics. 2024 Aug 24;25(1):276. doi: 10.1186/s12859-024-05900-9.
8
Untargeted faecal metabolomics for the discovery of biomarkers and treatment targets for inflammatory bowel diseases.非靶向粪便代谢组学在炎症性肠病生物标志物和治疗靶点发现中的应用。
Gut. 2024 Oct 7;73(11):1909-1920. doi: 10.1136/gutjnl-2023-329969.
9
Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data.利用多模态和组学机器学习集成(MOMLIN)推进药物反应预测:乳腺癌临床数据案例研究。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae300.
10
Multi-view data visualisation manifold learning.多视图数据可视化 流形学习
PeerJ Comput Sci. 2024 May 24;10:e1993. doi: 10.7717/peerj-cs.1993. eCollection 2024.
多组学整合工具在癌症驱动基因识别和肿瘤亚分型中的比较研究。
Brief Bioinform. 2020 Dec 1;21(6):1920-1936. doi: 10.1093/bib/bbz121.
4
Identifying progressive imaging genetic patterns via multi-task sparse canonical correlation analysis: a longitudinal study of the ADNI cohort.通过多任务稀疏典型相关分析识别进行性影像遗传学模式:ADNI 队列的纵向研究。
Bioinformatics. 2019 Jul 15;35(14):i474-i483. doi: 10.1093/bioinformatics/btz320.
5
A Bayesian two-way latent structure model for genomic data integration reveals few pan-genomic cluster subtypes in a breast cancer cohort.贝叶斯双向潜在结构模型用于基因组数据整合,揭示乳腺癌队列中很少有泛基因组聚类亚型。
Bioinformatics. 2019 Dec 1;35(23):4886-4897. doi: 10.1093/bioinformatics/btz381.
6
An iterative penalized least squares approach to sparse canonical correlation analysis.一种用于稀疏典型相关分析的迭代惩罚最小二乘法。
Biometrics. 2019 Sep;75(3):734-744. doi: 10.1111/biom.13043. Epub 2019 Apr 9.
7
A Selective Review of Multi-Level Omics Data Integration Using Variable Selection.使用变量选择对多组学数据整合进行的选择性综述
High Throughput. 2019 Jan 18;8(1):4. doi: 10.3390/ht8010004.
8
Multivariate analysis of genome-wide data to identify potential pleiotropic genes for type 2 diabetes, obesity and coronary artery disease using MetaCCA.利用 MetaCCA 对全基因组数据进行多元分析,以鉴定 2 型糖尿病、肥胖和冠心病的潜在多效基因。
Int J Cardiol. 2019 May 15;283:144-150. doi: 10.1016/j.ijcard.2018.10.102. Epub 2018 Oct 31.
9
Deep Learning data integration for better risk stratification models of bladder cancer.用于改进膀胱癌风险分层模型的深度学习数据整合
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:197-206. eCollection 2018.
10
Patient similarity for precision medicine: A systematic review.精准医学中的患者相似性:系统评价。
J Biomed Inform. 2018 Jul;83:87-96. doi: 10.1016/j.jbi.2018.06.001. Epub 2018 Jun 1.