Suppr超能文献

HMMER提取器:一种基于隐马尔可夫模型识别基因组大分子代谢物的辅助工具包。

HMMER-Extractor: an auxiliary toolkit for identifying genomic macromolecular metabolites based on Hidden Markov Models.

作者信息

Yang Jing, Sun Siqi, Sun Ning, Lu Li, Zhang Chengwu, Shi Wanyu, Zhao Yunhe, Jia Shulei

机构信息

School of Basic Medical Sciences, Shanxi Medical University, Taiyuan 030001, China.

Department of Cardiology, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, China.

出版信息

Int J Biol Macromol. 2024 Dec;283(Pt 2):137666. doi: 10.1016/j.ijbiomac.2024.137666. Epub 2024 Nov 17.

Abstract

Human microbiome contains various microbial macromolecules with important biological functions. The Hidden Markov Models (HMMs) can overcome the problem of low similarity sequences with distant relationships and are widely implemented within various sequence alignment softwares. However, the HMM-based sequence alignments can generate a large number of results, how to quickly screen and batch extract target homologs from microbiomes is the major sticking points. It is necessary to develop an integrated gene filter and extraction pipeline to quickly and accurately screen homologs. Here, we introduced the HMMER-Extractor for amino acids or nucleotide sequences extraction, which was a supporting toolkit through provided filtering scores and an iterative keyword matching (IKM) logic. To make it more user-friendly and accessible, we further presented a visualized web server platform. An interactive HTML output provided a user-friendly way to browse homologous annotations and sequence extraction. The web server provided the community with a streamlined and user-friendly interface to analyze microbiomes. Through the HMMER-Extractor, we constructed a cardiovascular disease related gene dataset of the macromolecular metabolite trimethylamine (TMA) and lipopolysaccharide (LPS) based on 46,699 bacterial genomes from human gut. Approximately 21,014 and 1961 bacterial strains were identified to contain the cnt or cut operon of TMA, and the waa gene cluster of LPS, respectively. The Escherichia coli occupied the largest proportion among all the bacterial species, which belonged to the phyla Firmicutes. The HMMER-Extractor toolkit is an integrated pipeline and has been proven to be accurate and fast in extracting target macromolecular encoding genes from microbial genomes.

摘要

人类微生物组包含各种具有重要生物学功能的微生物大分子。隐马尔可夫模型(HMM)可以克服与远缘关系的低相似性序列问题,并在各种序列比对软件中广泛应用。然而,基于HMM的序列比对会产生大量结果,如何从微生物组中快速筛选和批量提取目标同源物是主要难点。有必要开发一个集成的基因筛选和提取流程,以快速准确地筛选同源物。在此,我们介绍了用于氨基酸或核苷酸序列提取的HMMER-Extractor,它是一个通过提供过滤分数和迭代关键词匹配(IKM)逻辑的支持工具包。为了使其更便于用户使用和访问,我们进一步展示了一个可视化的网络服务器平台。交互式HTML输出提供了一种用户友好的方式来浏览同源注释和序列提取。该网络服务器为社区提供了一个简化且用户友好的界面来分析微生物组。通过HMMER-Extractor,我们基于来自人类肠道的46699个细菌基因组构建了一个与心血管疾病相关的基因数据集,该数据集涉及大分子代谢物三甲胺(TMA)和脂多糖(LPS)。分别鉴定出约21014株和1961株细菌含有TMA的cnt或cut操纵子以及LPS的waa基因簇。在所有细菌物种中,大肠杆菌占比最大,属于厚壁菌门。HMMER-Extractor工具包是一个集成流程,已被证明在从微生物基因组中提取目标大分子编码基因方面准确且快速。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验