Suppr超能文献

微生物组组成数据分析中的变量选择

Variable selection in microbiome compositional data analysis.

作者信息

Susin Antoni, Wang Yiwen, Lê Cao Kim-Anh, Calle M Luz

机构信息

Mathematical Department, UPC-Barcelona Tech, 08028 Barcelona, Spain.

Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia.

出版信息

NAR Genom Bioinform. 2020 May 13;2(2):lqaa029. doi: 10.1093/nargab/lqaa029. eCollection 2020 Jun.

Abstract

Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: , a forward selection approach for the identification of compositional balances, and and , two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from not readily transferable. is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies.

摘要

尽管变量选择是微生物组分析中最相关的任务之一,例如用于识别微生物特征,但许多研究仍然依赖于忽略微生物组数据组成性质的方法。组成数据分析方法的适用性受到软件可用性以及解释其结果的难度的阻碍。这项工作聚焦于三种承认微生物组数据组成结构的变量选择方法:一种用于识别组成平衡的向前选择方法,以及两种用于组成数据分析的惩罚回归模型。本研究突出了这些方法之间的联系,并揭示了用于变量选择的中心对数比变换的一些局限性。特别是,它在子组成上不一致这一事实使得从该变换获得的微生物特征不易转移。当重点是识别最相关的微生物分类群时,该方法计算效率高且适用。当目标是获得具有最佳预测性能的简约模型时,另一种方法表现突出,但它在计算上很耗时。我们提供了一个可重现的示例,用于应用这些方法,这将使研究人员能够在微生物组研究中充分发挥它们的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0edc/7671404/91bea0052938/lqaa029fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验