Suppr超能文献

稀疏典型相关分析中惩罚函数的比较

Comparison of Penalty Functions for Sparse Canonical Correlation Analysis.

作者信息

Chalise Prabhakar, Fridley Brooke L

机构信息

Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905.

出版信息

Comput Stat Data Anal. 2012 Feb 1;56(2):245-254. doi: 10.1016/j.csda.2011.07.012.

Abstract

Canonical correlation analysis (CCA) is a widely used multivariate method for assessing the association between two sets of variables. However, when the number of variables far exceeds the number of subjects, such in the case of large-scale genomic studies, the traditional CCA method is not appropriate. In addition, when the variables are highly correlated the sample covariance matrices become unstable or undefined. To overcome these two issues, sparse canonical correlation analysis (SCCA) for multiple data sets has been proposed using a Lasso type of penalty. However, these methods do not have direct control over sparsity of solution. An additional step that uses Bayesian Information Criterion (BIC) has also been suggested to further filter out unimportant features. In this paper, a comparison of four penalty functions (Lasso, Elastic-net, SCAD and Hard-threshold) for SCCA with and without the BIC filtering step have been carried out using both real and simulated genotypic and mRNA expression data. This study indicates that the SCAD penalty with BIC filter would be a preferable penalty function for application of SCCA to genomic data.

摘要

典型相关分析(CCA)是一种广泛应用的多变量方法,用于评估两组变量之间的关联。然而,当变量数量远远超过样本数量时,如在大规模基因组研究中,传统的CCA方法并不适用。此外,当变量高度相关时,样本协方差矩阵会变得不稳定或无定义。为了克服这两个问题,已提出使用套索(Lasso)型惩罚的多数据集稀疏典型相关分析(SCCA)。然而,这些方法无法直接控制解的稀疏性。还建议使用贝叶斯信息准则(BIC)的额外步骤来进一步筛选出不重要的特征。在本文中,使用真实和模拟的基因型及mRNA表达数据,对有和没有BIC过滤步骤的SCCA的四种惩罚函数(套索、弹性网络、平滑截断绝对偏差和硬阈值)进行了比较。本研究表明,带有BIC过滤器 的平滑截断绝对偏差惩罚将是SCCA应用于基因组数据时更可取的惩罚函数。

相似文献

1
4
9
Fast Multi-Task SCCA Learning with Feature Selection for Multi-Modal Brain Imaging Genetics.基于多模态脑成像遗传学特征选择的快速多任务SCCA学习
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2018 Dec;2018:356-361. doi: 10.1109/BIBM.2018.8621298. Epub 2019 Jan 24.

引用本文的文献

6
Robust sparse canonical correlation analysis.稳健稀疏典型相关分析
BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.
7
ATHENA: the analysis tool for heritable and environmental network associations.ATHENA:遗传性和环境网络关联的分析工具。
Bioinformatics. 2014 Mar 1;30(5):698-705. doi: 10.1093/bioinformatics/btt572. Epub 2013 Oct 21.
8
Population level inference for multivariate MEG analysis.群体水平上的多变量 MEG 分析推断。
PLoS One. 2013 Aug 5;8(8):e71305. doi: 10.1371/journal.pone.0071305. Print 2013.

本文引用的文献

9
Characterization of multilocus linkage disequilibrium.多位点连锁不平衡的特征分析
Genet Epidemiol. 2005 Apr;28(3):193-206. doi: 10.1002/gepi.20056.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验