Department of Epidemiology & Biostatistics, School of Rural Public Health, Texas A&M Health Science Center, College Station, TX 77843-1266, USA, School of Mathematics and Statistics, University of Sydney, NSW 2006 Australia, Department of Statistics, Texas A&M University, College Station, TX 77843-3143, USA and Department of Poultry Science, Intercollegiate Faculty of Nutrition, Texas A&M University, College Station, TX 77840, USA.
Bioinformatics. 2014 Mar 15;30(6):831-7. doi: 10.1093/bioinformatics/btt608. Epub 2013 Oct 24.
Gut microbiota can be classified at multiple taxonomy levels. Strategies to use changes in microbiota composition to effect health improvements require knowing at which taxonomy level interventions should be aimed. Identifying these important levels is difficult, however, because most statistical methods only consider when the microbiota are classified at one taxonomy level, not multiple.
Using L1 and L2 regularizations, we developed a new variable selection method that identifies important features at multiple taxonomy levels. The regularization parameters are chosen by a new, data-adaptive, repeated cross-validation approach, which performed well. In simulation studies, our method outperformed competing methods: it more often selected significant variables, and had small false discovery rates and acceptable false-positive rates. Applying our method to gut microbiota data, we found which taxonomic levels were most altered by specific interventions or physiological status.
The new approach is implemented in an R package, which is freely available from the corresponding author.
Supplementary data are available at Bioinformatics online.
肠道微生物群可以在多个分类学水平上进行分类。利用微生物群组成的变化来促进健康的策略需要知道干预措施应该针对哪个分类学水平。然而,确定这些重要水平是困难的,因为大多数统计方法只考虑将微生物群分类在一个分类学水平上,而不是多个水平。
我们使用 L1 和 L2 正则化方法开发了一种新的变量选择方法,可以在多个分类学水平上识别重要特征。正则化参数是通过一种新的、数据自适应的、重复交叉验证方法选择的,该方法表现良好。在模拟研究中,我们的方法优于竞争方法:它更经常选择显著的变量,并且具有较小的假发现率和可接受的假阳性率。将我们的方法应用于肠道微生物组数据,我们发现了特定干预或生理状态最能改变哪些分类学水平。
新方法在一个 R 包中实现,可从通讯作者处免费获得。
补充资料可在《生物信息学》在线获取。