稀疏典型相关分析中惩罚函数的比较

Comparison of Penalty Functions for Sparse Canonical Correlation Analysis.

作者信息

Chalise Prabhakar, Fridley Brooke L

机构信息

Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905.

出版信息

Comput Stat Data Anal. 2012 Feb 1;56(2):245-254. doi: 10.1016/j.csda.2011.07.012.

DOI:10.1016/j.csda.2011.07.012

PMID:21984855

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3185379/

Abstract

Canonical correlation analysis (CCA) is a widely used multivariate method for assessing the association between two sets of variables. However, when the number of variables far exceeds the number of subjects, such in the case of large-scale genomic studies, the traditional CCA method is not appropriate. In addition, when the variables are highly correlated the sample covariance matrices become unstable or undefined. To overcome these two issues, sparse canonical correlation analysis (SCCA) for multiple data sets has been proposed using a Lasso type of penalty. However, these methods do not have direct control over sparsity of solution. An additional step that uses Bayesian Information Criterion (BIC) has also been suggested to further filter out unimportant features. In this paper, a comparison of four penalty functions (Lasso, Elastic-net, SCAD and Hard-threshold) for SCCA with and without the BIC filtering step have been carried out using both real and simulated genotypic and mRNA expression data. This study indicates that the SCAD penalty with BIC filter would be a preferable penalty function for application of SCCA to genomic data.

摘要

典型相关分析（CCA）是一种广泛应用的多变量方法，用于评估两组变量之间的关联。然而，当变量数量远远超过样本数量时，如在大规模基因组研究中，传统的CCA方法并不适用。此外，当变量高度相关时，样本协方差矩阵会变得不稳定或无定义。为了克服这两个问题，已提出使用套索（Lasso）型惩罚的多数据集稀疏典型相关分析（SCCA）。然而，这些方法无法直接控制解的稀疏性。还建议使用贝叶斯信息准则（BIC）的额外步骤来进一步筛选出不重要的特征。在本文中，使用真实和模拟的基因型及mRNA表达数据，对有和没有BIC过滤步骤的SCCA的四种惩罚函数（套索、弹性网络、平滑截断绝对偏差和硬阈值）进行了比较。本研究表明，带有BIC过滤器的平滑截断绝对偏差惩罚将是SCCA应用于基因组数据时更可取的惩罚函数。

相似文献

Comparison of Penalty Functions for Sparse Canonical Correlation Analysis.稀疏典型相关分析中惩罚函数的比较

Comput Stat Data Anal. 2012 Feb 1;56(2):245-254. doi: 10.1016/j.csda.2011.07.012.

An iterative penalized least squares approach to sparse canonical correlation analysis.一种用于稀疏典型相关分析的迭代惩罚最小二乘法。

Biometrics. 2019 Sep;75(3):734-744. doi: 10.1111/biom.13043. Epub 2019 Apr 9.

Group sparse canonical correlation analysis for genomic data integration.基于组稀疏典型相关分析的基因组数据整合。

BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245.

Sparse canonical correlation analysis from a predictive point of view.从预测角度看稀疏典型相关分析。

Biom J. 2015 Sep;57(5):834-51. doi: 10.1002/bimj.201400226. Epub 2015 Jul 6.

Detecting genetic associations with brain imaging phenotypes in Alzheimer's disease via a novel structured SCCA approach.通过一种新颖的结构化 SCCA 方法在阿尔茨海默病中检测与脑影像表型相关的遗传关联。

Med Image Anal. 2020 Apr;61:101656. doi: 10.1016/j.media.2020.101656. Epub 2020 Jan 23.

Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.弹性 SCAD 作为一种新的惩罚方法，用于高维数据中的 SVM 分类任务。

BMC Bioinformatics. 2011 May 9;12:138. doi: 10.1186/1471-2105-12-138.

Sparse canonical correlation analysis with application to genomic data integration.应用于基因组数据整合的稀疏典型相关分析。

Stat Appl Genet Mol Biol. 2009;8:Article 1. doi: 10.2202/1544-6115.1406. Epub 2009 Jan 6.

An Improved Fusion Paired Group Lasso Structured Sparse Canonical Correlation Analysis Based on Brain Imaging Genetics to Identify Biomarkers of Alzheimer's Disease.一种基于脑影像遗传学的改进融合配对组套索结构稀疏典型相关分析，用于识别阿尔茨海默病的生物标志物。

Front Aging Neurosci. 2022 Jan 6;13:817520. doi: 10.3389/fnagi.2021.817520. eCollection 2021.

Fast Multi-Task SCCA Learning with Feature Selection for Multi-Modal Brain Imaging Genetics.基于多模态脑成像遗传学特征选择的快速多任务SCCA学习

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2018 Dec;2018:356-361. doi: 10.1109/BIBM.2018.8621298. Epub 2019 Jan 24.

Structured sparse CCA for brain imaging genetics via graph OSCAR.通过图OSCAR实现用于脑成像遗传学的结构化稀疏典型相关分析

BMC Syst Biol. 2016 Aug 26;10 Suppl 3(Suppl 3):68. doi: 10.1186/s12918-016-0312-1.

引用本文的文献

Structure-adaptive canonical correlation analysis for microbiome multi-omics data.用于微生物组多组学数据的结构自适应典型相关分析

Front Genet. 2024 Nov 20;15:1489694. doi: 10.3389/fgene.2024.1489694. eCollection 2024.

Dimension-wise sparse low-rank approximation of a matrix with application to variable selection in high-dimensional integrative analyzes of association.矩阵的维度稀疏低秩逼近及其在高维关联综合分析中的变量选择应用

J Appl Stat. 2021 Aug 19;49(15):3889-3907. doi: 10.1080/02664763.2021.1967892. eCollection 2022.

Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study.通过稀疏典型相关分析整合多组学数据以预测复杂性状：一项比较研究。

Bioinformatics. 2020 Nov 1;36(17):4616-4625. doi: 10.1093/bioinformatics/btaa530.

Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation.阿尔茨海默病相关基因连锁区域单核苷酸多态性与脑结构变化之间的多变量关联：发现、优化与验证

Stat Appl Genet Mol Biol. 2017 Nov 27;16(5-6):349-365. doi: 10.1515/sagmb-2016-0077.

Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information.通过结合生物信息的稀疏典型相关分析对转录组学和代谢组学数据进行综合分析。

Biometrics. 2018 Mar;74(1):300-312. doi: 10.1111/biom.12715. Epub 2017 May 8.

Robust sparse canonical correlation analysis.稳健稀疏典型相关分析

BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.

ATHENA: the analysis tool for heritable and environmental network associations.ATHENA：遗传性和环境网络关联的分析工具。

Bioinformatics. 2014 Mar 1;30(5):698-705. doi: 10.1093/bioinformatics/btt572. Epub 2013 Oct 21.

Population level inference for multivariate MEG analysis.群体水平上的多变量 MEG 分析推断。

PLoS One. 2013 Aug 5;8(8):e71305. doi: 10.1371/journal.pone.0071305. Print 2013.

Group sparse canonical correlation analysis for genomic data integration.基于组稀疏典型相关分析的基因组数据整合。

BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245.

本文引用的文献

Gemcitabine and arabinosylcytosin pharmacogenomics: genome-wide association and drug response biomarkers.吉西他滨和阿糖胞苷药物基因组学：全基因组关联和药物反应生物标志物。

PLoS One. 2009 Nov 9;4(11):e7765. doi: 10.1371/journal.pone.0007765.

Extensions of sparse canonical correlation analysis with applications to genomic data.稀疏典型相关分析的扩展及其在基因组数据中的应用

Stat Appl Genet Mol Biol. 2009;8(1):Article28. doi: 10.2202/1544-6115.1470. Epub 2009 Jun 9.

Sparse canonical correlation analysis with application to genomic data integration.应用于基因组数据整合的稀疏典型相关分析。

Stat Appl Genet Mol Biol. 2009;8:Article 1. doi: 10.2202/1544-6115.1406. Epub 2009 Jan 6.

Sparse canonical methods for biological data integration: application to a cross-platform study.用于生物数据整合的稀疏典型方法：在一项跨平台研究中的应用

BMC Bioinformatics. 2009 Jan 26;10:34. doi: 10.1186/1471-2105-10-34.

Gemcitabine and cytosine arabinoside cytotoxicity: association with lymphoblastoid cell expression.吉西他滨和阿糖胞苷的细胞毒性：与淋巴母细胞样细胞表达的关联。

Cancer Res. 2008 Sep 1;68(17):7050-8. doi: 10.1158/0008-5472.CAN-08-0405.

Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis.通过惩罚典型相关分析量化基因表达与DNA标记之间的关联。

Stat Appl Genet Mol Biol. 2008;7(1):Article3. doi: 10.2202/1544-6115.1329. Epub 2008 Jan 23.

Testing association between disease and multiple SNPs in a candidate gene.检测候选基因中疾病与多个单核苷酸多态性之间的关联。

Genet Epidemiol. 2007 Jul;31(5):383-95. doi: 10.1002/gepi.20219.

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.一种用于大规模群体基因型数据的快速灵活统计模型：在推断缺失基因型和单倍型相位中的应用。

Am J Hum Genet. 2006 Apr;78(4):629-44. doi: 10.1086/502802. Epub 2006 Feb 17.

Characterization of multilocus linkage disequilibrium.多位点连锁不平衡的特征分析

Genet Epidemiol. 2005 Apr;28(3):193-206. doi: 10.1002/gepi.20056.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验