Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA19104, USA.
Biostatistics. 2013 Apr;14(2):244-58. doi: 10.1093/biostatistics/kxs038. Epub 2012 Oct 15.
Motivated by studying the association between nutrient intake and human gut microbiome composition, we developed a method for structure-constrained sparse canonical correlation analysis (ssCCA) in a high-dimensional setting. ssCCA takes into account the phylogenetic relationships among bacteria, which provides important prior knowledge on evolutionary relationships among bacterial taxa. Our ssCCA formulation utilizes a phylogenetic structure-constrained penalty function to impose certain smoothness on the linear coefficients according to the phylogenetic relationships among the taxa. An efficient coordinate descent algorithm is developed for optimization. A human gut microbiome data set is used to illustrate this method. Both simulations and real data applications show that ssCCA performs better than the standard sparse CCA in identifying meaningful variables when there are structures in the data.
受研究营养摄入与人类肠道微生物组组成之间关联的启发,我们开发了一种用于高维环境下结构约束稀疏典型相关分析(ssCCA)的方法。ssCCA 考虑了细菌之间的系统发育关系,这为细菌分类群之间的进化关系提供了重要的先验知识。我们的 ssCCA 公式利用了一个系统发育结构约束惩罚函数,根据分类群之间的系统发育关系,对线性系数施加一定的平滑度。开发了一种有效的坐标下降算法来进行优化。利用人类肠道微生物组数据集来说明这种方法。模拟和实际数据应用都表明,当数据中存在结构时,ssCCA 在识别有意义的变量方面比标准稀疏 CCA 表现更好。