用于发现高维数据中非线性相互作用的稀疏核典型相关分析。

Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data.

作者信息

Yoshida Kosuke, Yoshimoto Junichiro, Doya Kenji

机构信息

Graduate School of Informatics, Kyoto University, Kyoto, Japan.

Neural Computation Unit, Okinawa Institute of Science and Technology, Okinawa, Japan.

出版信息

BMC Bioinformatics. 2017 Feb 14;18(1):108. doi: 10.1186/s12859-017-1543-x.

DOI:10.1186/s12859-017-1543-x

PMID:28196464

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5310015/

Abstract

BACKGROUND

Advance in high-throughput technologies in genomics, transcriptomics, and metabolomics has created demand for bioinformatics tools to integrate high-dimensional data from different sources. Canonical correlation analysis (CCA) is a statistical tool for finding linear associations between different types of information. Previous extensions of CCA used to capture nonlinear associations, such as kernel CCA, did not allow feature selection or capturing of multiple canonical components. Here we propose a novel method, two-stage kernel CCA (TSKCCA) to select appropriate kernels in the framework of multiple kernel learning.

RESULTS

TSKCCA first selects relevant kernels based on the HSIC criterion in the multiple kernel learning framework. Weights are then derived by non-negative matrix decomposition with L1 regularization. Using artificial datasets and nutrigenomic datasets, we show that TSKCCA can extract multiple, nonlinear associations among high-dimensional data and multiplicative interactions among variables.

CONCLUSIONS

TSKCCA can identify nonlinear associations among high-dimensional data more reliably than previous nonlinear CCA methods.

摘要

背景

基因组学、转录组学和代谢组学中高通量技术的进步引发了对生物信息学工具的需求，以便整合来自不同来源的高维数据。典型相关分析（CCA）是一种用于寻找不同类型信息之间线性关联的统计工具。以前用于捕获非线性关联的CCA扩展方法，如核CCA，不允许进行特征选择或捕获多个典型成分。在此，我们提出一种新方法——两阶段核CCA（TSKCCA），用于在多核学习框架中选择合适的核。

结果

TSKCCA首先在多核学习框架中基于HSIC标准选择相关核。然后通过带有L1正则化的非负矩阵分解得出权重。使用人工数据集和营养基因组数据集，我们表明TSKCCA能够提取高维数据之间的多个非线性关联以及变量之间的乘法相互作用。

结论

与以前的非线性CCA方法相比，TSKCCA能够更可靠地识别高维数据之间的非线性关联。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e524/5310015/8282bb26aec9/12859_2017_1543_Fig1_HTML.jpg

相似文献

Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data.用于发现高维数据中非线性相互作用的稀疏核典型相关分析。

BMC Bioinformatics. 2017 Feb 14;18(1):108. doi: 10.1186/s12859-017-1543-x.

A new randomized Kaczmarz based kernel canonical correlation analysis algorithm with applications to information retrieval.基于 Kaczmarz 的核典型相关分析的一种新的随机化算法及其在信息检索中的应用。

Neural Netw. 2018 Feb;98:178-191. doi: 10.1016/j.neunet.2017.11.013. Epub 2017 Dec 2.

Nonlinear association criterion, nonlinear Granger causality and related issues with applications to neuroimage studies.非线性关联准则、非线性格兰杰因果关系及相关问题在神经影像研究中的应用

J Neurosci Methods. 2016 Mar 15;262:110-32. doi: 10.1016/j.jneumeth.2016.01.003. Epub 2016 Jan 11.

Kernel and nonlinear canonical correlation analysis.核与非线性典型相关分析

Int J Neural Syst. 2000 Oct;10(5):365-77. doi: 10.1142/S012906570000034X.

Group sparse canonical correlation analysis for genomic data integration.基于组稀疏典型相关分析的基因组数据整合。

BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245.

Sparse canonical methods for biological data integration: application to a cross-platform study.用于生物数据整合的稀疏典型方法：在一项跨平台研究中的应用

BMC Bioinformatics. 2009 Jan 26;10:34. doi: 10.1186/1471-2105-10-34.

Feature selection and multi-kernel learning for sparse representation on a manifold.基于流形上的稀疏表示的特征选择和多核学习。

Neural Netw. 2014 Mar;51:9-16. doi: 10.1016/j.neunet.2013.11.009. Epub 2013 Nov 28.

Robust sparse canonical correlation analysis.稳健稀疏典型相关分析

BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.

Integrative analysis of gene expression and copy number alterations using canonical correlation analysis.基于典型相关分析的基因表达和拷贝数改变的综合分析。

BMC Bioinformatics. 2010 Apr 15;11:191. doi: 10.1186/1471-2105-11-191.

Gene shaving using a sensitivity analysis of kernel based machine learning approach, with applications to cancer data.利用基于核的机器学习方法的敏感性分析进行基因剪接，应用于癌症数据。

PLoS One. 2019 May 23;14(5):e0217027. doi: 10.1371/journal.pone.0217027. eCollection 2019.

引用本文的文献

An Integrative Multi-Omics Random Forest Framework for Robust Biomarker Discovery.一种用于稳健生物标志物发现的综合多组学随机森林框架。

bioRxiv. 2025 Mar 6:2025.03.05.641533. doi: 10.1101/2025.03.05.641533.

Categorical Data Analysis for High-Dimensional Sparse Gene Expression Data.高维稀疏基因表达数据的分类数据分析

BioTech (Basel). 2023 Jul 27;12(3):52. doi: 10.3390/biotech12030052.

Computational strategies for single-cell multi-omics integration.单细胞多组学整合的计算策略

Comput Struct Biotechnol J. 2021 Apr 27;19:2588-2596. doi: 10.1016/j.csbj.2021.04.060. eCollection 2021.

Enhancing Multi-Center Generalization of Machine Learning-Based Depression Diagnosis From Resting-State fMRI.增强基于静息态功能磁共振成像的机器学习抑郁症诊断的多中心泛化能力

Front Psychiatry. 2020 May 28;11:400. doi: 10.3389/fpsyt.2020.00400. eCollection 2020.

An ANOVA approach for statistical comparisons of brain networks.基于方差分析的脑网络统计比较方法。

Sci Rep. 2018 Mar 16;8(1):4746. doi: 10.1038/s41598-018-23152-5.

Iterative random forests to discover predictive and stable high-order interactions.迭代随机森林发现预测和稳定的高阶交互。

Proc Natl Acad Sci U S A. 2018 Feb 20;115(8):1943-1948. doi: 10.1073/pnas.1711236115. Epub 2018 Jan 19.

本文引用的文献

Sparse canonical correlation analysis from a predictive point of view.从预测角度看稀疏典型相关分析。

Biom J. 2015 Sep;57(5):834-51. doi: 10.1002/bimj.201400226. Epub 2015 Jul 6.

High-dimensional feature selection by feature-wise kernelized Lasso.基于特征核 Lasso 的高维特征选择。

Neural Comput. 2014 Jan;26(1):185-207. doi: 10.1162/NECO_a_00537. Epub 2013 Oct 8.

Group sparse canonical correlation analysis for genomic data integration.基于组稀疏典型相关分析的基因组数据整合。

BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245.

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.一种惩罚矩阵分解及其在稀疏主成分分析和典型相关分析中的应用。

Biostatistics. 2009 Jul;10(3):515-34. doi: 10.1093/biostatistics/kxp008. Epub 2009 Apr 17.

Sparse canonical correlation analysis with application to genomic data integration.应用于基因组数据整合的稀疏典型相关分析。

Stat Appl Genet Mol Biol. 2009;8:Article 1. doi: 10.2202/1544-6115.1406. Epub 2009 Jan 6.

Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis.通过惩罚典型相关分析量化基因表达与DNA标记之间的关联。

Stat Appl Genet Mol Biol. 2008;7(1):Article3. doi: 10.2202/1544-6115.1329. Epub 2008 Jan 23.

Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study.通过营养基因组学研究揭示的PPARα介导的脂质和外源性物质代谢调节的新方面。

Hepatology. 2007 Mar;45(3):767-77. doi: 10.1002/hep.21510.

Protein network inference from multiple genomic data: a supervised approach.基于多组学数据的蛋白质网络推断：一种监督方法。

Bioinformatics. 2004 Aug 4;20 Suppl 1:i363-70. doi: 10.1093/bioinformatics/bth910.

Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis.通过广义核典型相关分析从多个基因组数据中提取相关基因簇。

Bioinformatics. 2003;19 Suppl 1:i323-30. doi: 10.1093/bioinformatics/btg1045.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于发现高维数据中非线性相互作用的稀疏核典型相关分析。

Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献