Suppr超能文献

宏基因组测序数据的整合基因预测与肽段组装

Integrated gene prediction and peptide assembly of metagenomic sequencing data.

作者信息

Thippabhotla Sirisha, Liu Ben, Podgorny Adam, Yooseph Shibu, Yang Youngik, Zhang Jun, Zhong Cuncong

机构信息

Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS 66045, USA.

Center for Computational Biology, The University of Kansas, Lawrence, KS 66045, USA.

出版信息

NAR Genom Bioinform. 2023 Mar 11;5(1):lqad023. doi: 10.1093/nargab/lqad023. eCollection 2023 Mar.

Abstract

Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92-97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP.

摘要

宏基因组学是对特定微生物群落中所含全部基因组内容的研究。宏基因组功能分析旨在对蛋白质家族进行定量,并从宏基因组中重建代谢途径。它在理解微生物群落与其宿主或环境之间的相互作用中起着核心作用。功能分析能够发现新的蛋白质家族,但对于高复杂性群落而言,仍然具有挑战性。目前有三种主要方法用于发现新基因或蛋白质:核苷酸组装、基因识别和肽段组装。遗憾的是,它们之间的信息依赖性被忽视了,并且每种方法都被当作一个独立的问题来处理。在这项工作中,我们开发了一种复杂的工作流程,称为集成宏基因组蛋白质预测器(iMPP),它利用信息依赖性来进行更好的功能分析。iMPP包含三个新颖的模块:一个混合组装图生成模块、一个基于图的基因识别模块和一个基于肽段组装的优化模块。iMPP显著提高了对未组装宏基因组读数的现有基因识别灵敏度,在高精度水平(>85%)下实现了92 - 97%的召回率。iMPP还能实现更灵敏、准确的肽段组装,找回更多的参考蛋白质并提供更多的假设蛋白质序列。iMPP的高性能能够为所研究的微生物群落提供更全面、无偏差的视角。可从https://github.com/Sirisha-t/iMPP免费获取iMPP。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a35c/10006731/b991bd5d57fa/lqad023fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验