Suppr超能文献

Phy-PMRFI:基于随机森林特征重要性的宏基因组功能预测的系统发育感知方法

Phy-PMRFI: Phylogeny-Aware Prediction of Metagenomic Functions Using Random Forest Feature Importance.

出版信息

IEEE Trans Nanobioscience. 2019 Jul;18(3):273-282. doi: 10.1109/TNB.2019.2912824. Epub 2019 Apr 24.

Abstract

High-throughput sequencing techniques have accelerated functional metagenomics studies through the generation of large volumes of omics data. The integration of these data using computational approaches is potentially useful for predicting metagenomic functions. Machine learning (ML) models can be trained using microbial features which are then used to classify microbial data into different functional classes. For example, ML analyses over the human microbiome data has been linked to the prediction of important biological states. For analysing omics data, integrating abundance count of taxonomical features with their biological relationships is important. These relationships can potentially be uncovered from the phylogenetic tree of microbial taxa. In this paper, we propose a novel integrative framework Phy-PMRFI. This framework is driven by the phylogeny-based modeling of omics data to predict metagenomic functions using important features selected by a random forest importance (RFI) strategy. The proposed framework integrates the underlying phylogenetic tree information with abundance measures of microbial species (features) by creating a novel phylogeny and abundance aware matrix structure (PAAM). Phy-PMRFI progresses by ranking the microbial features using an RFI measure. This is then used as input for microbiome classification. The resultant feature set enhances the performance of the state-of-art methods such as support vector machines. Our proposed integrative framework also outperforms the state-of-the-art pipeline of phylogenetic isometric log-ratio transform (PhILR) and MetaPhyl. Prediction accuracy of 90 % is obtained with Phy-PMRFI over human throat microbiome in comparison to other approaches of PhILR with 53% and MetaPhyl with 71% accuracy.

摘要

高通量测序技术通过生成大量组学数据,加速了功能宏基因组学研究。通过计算方法整合这些数据,对于预测宏基因组功能可能是有用的。可以使用微生物特征来训练机器学习 (ML) 模型,然后将微生物数据分类到不同的功能类别中。例如,对人类微生物组数据的 ML 分析已与重要生物状态的预测相关联。为了分析组学数据,将分类特征的丰度计数与其生物关系整合起来很重要。这些关系可以从微生物分类群的系统发育树中揭示出来。在本文中,我们提出了一种新的综合框架 Phy-PMRFI。该框架由基于系统发育的组学数据建模驱动,使用随机森林重要性 (RFI) 策略选择的重要特征来预测宏基因组功能。该框架通过创建一个新的系统发育和丰度感知矩阵结构 (PAAM) 将潜在的系统发育树信息与微生物物种的丰度度量 (特征) 集成在一起。Phy-PMRFI 通过使用 RFI 度量对微生物特征进行排序来推进。然后将其用作微生物组分类的输入。所得特征集增强了支持向量机等最先进方法的性能。我们提出的综合框架也优于最先进的系统发育等距对数比变换 (PhILR) 和 MetaPhyl 管道。与 PhILR 的 53%和 MetaPhyl 的 71%相比,Phy-PMRFI 在人类喉咙微生物组上的预测准确率为 90%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验