Suppr超能文献

DeepToA:一种用于预测微生物组活动部位的集成深度学习方法。

DeepToA: an ensemble deep-learning approach to predicting the theater of activity of a microbiome.

机构信息

Department of Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen 72076, Germany.

International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, Tübingen 72076, Germany.

出版信息

Bioinformatics. 2022 Oct 14;38(20):4670-4676. doi: 10.1093/bioinformatics/btac584.

Abstract

MOTIVATION

Metagenomics is the study of microbiomes using DNA sequencing. A microbiome consists of an assemblage of microbes that is associated with a 'theater of activity' (ToA). An important question is, to what degree does the taxonomic and functional content of the former depend on the (details of the) latter? Here, we investigate a related technical question: Given a taxonomic and/or functional profile estimated from metagenomic sequencing data, how to predict the associated ToA? We present a deep-learning approach to this question. We use both taxonomic and functional profiles as input. We apply node2vec to embed hierarchical taxonomic profiles into numerical vectors. We then perform dimension reduction using clustering, to address the sparseness of the taxonomic data and thus make the problem more amenable to deep-learning algorithms. Functional features are combined with textual descriptions of protein families or domains. We present an ensemble deep-learning framework DeepToA for predicting the ToA of amicrobial community, based on taxonomic and functional profiles. We use SHAP (SHapley Additive exPlanations) values to determine which taxonomic and functional features are important for the prediction.

RESULTS

Based on 7560 metagenomic profiles downloaded from MGnify, classified into 10 different theaters of activity, we demonstrate that DeepToA has an accuracy of 98.30%. We show that adding textual information to functional features increases the accuracy.

AVAILABILITY AND IMPLEMENTATION

Our approach is available at http://ab.inf.uni-tuebingen.de/software/deeptoa.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

宏基因组学是使用 DNA 测序研究微生物组的学科。微生物组由与“活动场所”(ToA)相关的微生物集合组成。一个重要的问题是,前者的分类学和功能内容在多大程度上取决于后者(细节)?在这里,我们研究了一个相关的技术问题:给定从宏基因组测序数据估计的分类学和/或功能谱,如何预测相关的 ToA?我们提出了一种针对这个问题的深度学习方法。我们将分类学和功能谱都用作输入。我们使用 node2vec 将层次分类谱嵌入数值向量中。然后,我们使用聚类进行降维,以解决分类数据的稀疏性问题,从而使问题更适合深度学习算法。功能特征与蛋白质家族或结构域的文本描述结合使用。我们提出了一种基于分类学和功能谱预测微生物群落 ToA 的深度学习框架 DeepToA。我们使用 SHAP(SHapley Additive exPlanations)值来确定哪些分类学和功能特征对预测很重要。

结果

基于从 MGnify 下载的 7560 个宏基因组谱,分为 10 个不同的活动场所,我们证明 DeepToA 的准确率为 98.30%。我们表明,向功能特征添加文本信息可以提高准确性。

可用性和实现

我们的方法可在 http://ab.inf.uni-tuebingen.de/software/deeptoa 上获得。

补充信息

补充数据可在“Bioinformatics”在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验