Suppr超能文献

cazy_webscraper:全面的 CAZyme 数据集的本地编译和查询。

cazy_webscraper: local compilation and interrogation of comprehensive CAZyme datasets.

机构信息

School of Biology and Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews, Fife, KY16 9ST, UK.

Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, G4 0RE, UK.

出版信息

Microb Genom. 2023 Aug;9(8). doi: 10.1099/mgen.0.001086.

Abstract

Carbohydrate active enzymes (CAZymes) are pivotal in biological processes including energy metabolism, cell structure maintenance, signalling, and pathogen recognition. Bioinformatic prediction and mining of CAZymes improves our understanding of these activities and enables discovery of candidates of interest for industrial biotechnology, particularly the processing of organic waste for biofuel production. CAZy (www.cazy.org) is a high-quality, manually curated, and authoritative database of CAZymes that is often the starting point for these analyses. Automated querying and integration of CAZy data with other public datasets would constitute a powerful resource for mining and exploring CAZyme diversity. However, CAZy does not itself provide methods to automate queries, or integrate annotation data from other sources (except by following hyperlinks) to support further analysis. To overcome these limitations we developed cazy_webscraper, a command-line tool that retrieves data from CAZy and other online resources to build a local, shareable and reproducible database that augments and extends the authoritative CAZy database. cazy_webscraper's integration of curated CAZyme annotations with their corresponding protein sequences, up-to-date taxonomy assignments, and protein structure data facilitates automated large-scale and targeted bioinformatic CAZyme family analysis and candidate screening. This tool has found widespread uptake in the community, with over 35 000 downloads (from April 2021 to June 2023). We demonstrate the use and application of cazy_webscraper to: (i) augment, update and correct CAZy database accessions; (ii) explore the taxonomic distribution of CAZymes recorded in CAZy, identifying under-represented taxa and unusual CAZy class distributions; and (iii) investigate three CAZymes having potential biotechnological application for degradation of biomass, but lacking a representative structure in the PDB database. We describe in general how cazy_webscraper facilitates functional, structural and evolutionary studies to aid identification of candidate enzymes for further characterization, and specifically note that CAZy provides supporting evidence for recent expansion of the Auxiliary Activities (AA) CAZy family in eukaryotes, consistent with functions potentially specific to eukaryotic lifestyles.

摘要

碳水化合物活性酶(CAZymes)在包括能量代谢、细胞结构维持、信号传递和病原体识别在内的生物过程中起着关键作用。通过生物信息学预测和挖掘 CAZymes,可以增进我们对这些活性的理解,并为工业生物技术发现有潜力的候选者,特别是有机废物的生物燃料生产提供支持。CAZy(www.cazy.org)是一个高质量的、人工整理的、权威的 CAZymes 数据库,通常是这些分析的起点。自动查询和将 CAZy 数据与其他公共数据集集成将构成挖掘和探索 CAZyme 多样性的强大资源。然而,CAZy 本身并没有提供自动查询的方法,也没有提供从其他来源集成注释数据的方法(除非通过超链接),以支持进一步的分析。为了克服这些限制,我们开发了 cazy_webscraper,这是一个命令行工具,它从 CAZy 和其他在线资源中检索数据,构建一个本地的、可共享的和可重现的数据库,以扩充和扩展权威的 CAZy 数据库。cazy_webscraper 将经过整理的 CAZyme 注释与相应的蛋白质序列、最新的分类分配和蛋白质结构数据集成在一起,方便了自动化的大规模和有针对性的生物信息学 CAZyme 家族分析和候选筛选。该工具在社区中得到了广泛的应用,从 2021 年 4 月到 2023 年 6 月,已有超过 35000 次下载。我们展示了 cazy_webscraper 的使用和应用,包括:(i)扩充、更新和纠正 CAZy 数据库访问;(ii)探索 CAZy 中记录的 CAZymes 的分类分布,确定代表性不足的分类群和不寻常的 CAZy 类分布;(iii)研究三种具有生物降解生物质潜力的 CAZymes,但在 PDB 数据库中缺乏代表性结构。我们一般描述了 cazy_webscraper 如何促进功能、结构和进化研究,以帮助鉴定进一步表征的候选酶,特别是注意到 CAZy 为最近在真核生物中辅助活性(AA)CAZy 家族的扩张提供了支持证据,这与真核生物生活方式可能特有的功能一致。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2b0/10483417/2fe5d18e692b/mgen-9-1086-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验