Microbiome. 2013 Jan 9;1(1):2. doi: 10.1186/2049-2618-1-2.
Human gut microbial functions are often associated with various diseases and host physiologies. Aging, a less explored factor, is also suspected to affect or be affected by microbiome alterations. By combining functional feature selection with supervised classification, we aim to facilitate identification of age-related functional characteristics in metagenomes from several human gut microbiome studies (MetaHIT, MicroAge, MicroObes, Kurokawa et al.'s and Gill et al.'s dataset).
We apply two feature selection methods, term frequency-inverse document frequency (TF-iDF) and minimum-redundancy maximum-relevancy (mRMR), to identify functional signatures that differentiate metagenomes by age. After features are reduced, we use a support vector machine (SVM) to predict host age of new metagenomes. Functional features are from protein families (Pfams), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, KEGG ontologies and the Gene Ontology (GO) database. Initial investigations demonstrate that ordination of the functional principal components shows great overlap between different age groups. However, when feature selection is applied, mRMR tightens the ordination cluster for each age group, and TF-iDF offers better linear separation. Both TF-iDF and mRMR were used in conjunction with a SVM classifier and achieved areas under receiver operating characteristic curves (AUCs) 10 to 15% above chance to classify individuals above/below mid-ages (about 38 to 43 years old) using Pfams. Better performance around mid-ages is also observed when using other functional categories and age-balanced dataset. We also identified some age-related Pfams that improved age discrimination at age 65 with another feature selection method called LEfSe, on an age-balanced dataset. The selected functional characteristics identify a broad range of age-relevant metabolisms, such as reduced vitamin B12 synthesis, reduced activity of reductases, increased DNA damage, occurrences of stress responses and immune system compromise, and upregulated glycosyltransferases in the aging population.
Feature selection can yield biologically meaningful results when used in conjunction with classification, and makes age classification of new human gut metagenomes feasible. While we demonstrate the promise of this approach, the data-dependent prediction performance could be further improved. We hypothesize that while the Qin et al. dataset is the most comprehensive to date, even deeper sampling is needed to better characterize and predict the microbiomes' functional content.
人类肠道微生物的功能通常与各种疾病和宿主生理机能有关。衰老,一个研究较少的因素,也被怀疑会影响或受微生物组改变的影响。通过结合功能特征选择和监督分类,我们旨在促进从几个人类肠道微生物组研究(MetaHIT、MicroAge、MicroObes、Kurokawa 等人的研究和 Gill 等人的研究)的宏基因组中识别与年龄相关的功能特征。
我们应用了两种特征选择方法,术语频率-逆文档频率(TF-iDF)和最小冗余最大相关性(mRMR),以识别通过年龄区分宏基因组的功能特征。在特征减少后,我们使用支持向量机(SVM)来预测新宏基因组的宿主年龄。功能特征来自蛋白质家族(Pfams)、京都基因与基因组百科全书(KEGG)途径、KEGG 本体和基因本体论(GO)数据库。初步研究表明,功能主成分的排序显示不同年龄组之间有很大的重叠。然而,当应用特征选择时,mRMR 使每个年龄组的排序聚类更加紧密,而 TF-iDF 提供了更好的线性分离。TF-iDF 和 mRMR 都与 SVM 分类器结合使用,并使用 Pfams 实现了对年龄在中老年人(约 38 至 43 岁)以上/以下的个体进行分类的接收者操作特征曲线(AUC)比机会高 10%至 15%。当使用其他功能类别和年龄平衡数据集时,也观察到中老年人周围的性能更好。当使用另一种称为 LEfSe 的特征选择方法在年龄平衡数据集上时,我们还确定了一些与年龄相关的 Pfams,它们可以提高 65 岁时的年龄歧视。选择的功能特征确定了广泛的与年龄相关的新陈代谢,例如维生素 B12 合成减少、还原酶活性降低、DNA 损伤增加、应激反应和免疫系统受损的发生以及衰老人群中糖基转移酶的上调。
特征选择与分类结合使用时可以产生有生物学意义的结果,并使新的人类肠道宏基因组的年龄分类成为可能。虽然我们展示了这种方法的前景,但数据依赖的预测性能可以进一步提高。我们假设,虽然 Qin 等人的数据集是迄今为止最全面的,但甚至需要更深入的采样来更好地描述和预测微生物组的功能内容。