Ditzler Gregory, Morrison J Calvin, Lan Yemin, Rosen Gail L
Department of Electrical & Computer Engineering, The University of Arizona, 1230 E Speedway Blvd., ECE Bldg., Tucson, 85721, AZ, USA.
Department of Electrical & Computer Engineering, Drexel University, 3141 Chestnut St., Philadelphia, 19104, PA, USA.
BMC Bioinformatics. 2015 Nov 4;16:358. doi: 10.1186/s12859-015-0793-8.
Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome.
We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets.
We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.
当前一些用于比较宏基因组学的软件工具使生态学家能够利用α多样性和β多样性来研究和探索细菌群落。特征子集选择——机器学习的一个子领域——也能为宏基因组或16S表型之间的差异提供独特见解。特别是,特征子集选择方法能够获取对所研究条件有高度影响的操作分类单元(OTU)或功能特征。例如,在之前的一项研究中,我们使用信息论特征选择来了解在人类肠道微生物群中最能区分不同年龄组的蛋白质家族丰度差异。
我们为微生物生态学家开发了一种新的Python命令行工具,它与广泛采用的BIOM格式兼容,可对生物数据格式实施信息论子集选择方法。我们在公开可用的数据集上展示了该软件工具的功能。
我们已根据GNU GPL许可向公众提供了Fizzy的软件实现。独立实现可在http://github.com/EESI/Fizzy上找到。