Du Lei, Huang Heng, Yan Jingwen, Kim Sungeun, Risacher Shannon L, Inlow Mark, Moore Jason H, Saykin Andrew J, Shen Li
Department of Radiology and Imaging Sciences, Indiana University, Indianapolis, IN, USA.
Department of Computer Science & Engineering, The University of Texas at Arlington, Arlington, TX, USA.
Bioinformatics. 2016 May 15;32(10):1544-51. doi: 10.1093/bioinformatics/btw033. Epub 2016 Jan 21.
Structured sparse canonical correlation analysis (SCCA) models have been used to identify imaging genetic associations. These models either use group lasso or graph-guided fused lasso to conduct feature selection and feature grouping simultaneously. The group lasso based methods require prior knowledge to define the groups, which limits the capability when prior knowledge is incomplete or unavailable. The graph-guided methods overcome this drawback by using the sample correlation to define the constraint. However, they are sensitive to the sign of the sample correlation, which could introduce undesirable bias if the sign is wrongly estimated.
We introduce a novel SCCA model with a new penalty, and develop an efficient optimization algorithm. Our method has a strong upper bound for the grouping effect for both positively and negatively correlated features. We show that our method performs better than or equally to three competing SCCA models on both synthetic and real data. In particular, our method identifies stronger canonical correlations and better canonical loading patterns, showing its promise for revealing interesting imaging genetic associations.
The Matlab code and sample data are freely available at http://www.iu.edu/∼shenlab/tools/angscca/
Supplementary data are available at Bioinformatics online.
结构化稀疏典型相关分析(SCCA)模型已被用于识别影像遗传学关联。这些模型要么使用组套索,要么使用图引导融合套索来同时进行特征选择和特征分组。基于组套索的方法需要先验知识来定义组,这在缺乏或没有完整先验知识时限制了其能力。图引导方法通过使用样本相关性来定义约束克服了这一缺点。然而,它们对样本相关性的符号敏感,如果符号估计错误,可能会引入不良偏差。
我们引入了一种带有新惩罚项的新型SCCA模型,并开发了一种高效的优化算法。我们的方法对于正相关和负相关特征的分组效果都有很强的上限。我们表明,在合成数据和真实数据上,我们的方法比三种竞争的SCCA模型表现更好或相当。特别是,我们的方法识别出更强的典型相关性和更好的典型载荷模式,显示出其在揭示有趣的影像遗传学关联方面的前景。
Matlab代码和样本数据可在http://www.iu.edu/∼shenlab/tools/angscca/免费获取。
补充数据可在《生物信息学》在线获取。