SamPler-一种用于选择基因功能注释例程参数的新方法。

SamPler - a novel method for selecting parameters for gene functional annotation routines.

机构信息

Centre of Biological Engineering, University of Minho, 4710-057, Braga, Portugal.

Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, 2780-157, Oeiras, Portugal.

出版信息

BMC Bioinformatics. 2019 Sep 5;20(1):454. doi: 10.1186/s12859-019-3038-4.

DOI:10.1186/s12859-019-3038-4

PMID:31488049

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6727554/

Abstract

BACKGROUND

As genome sequencing projects grow rapidly, the diversity of organisms with recently assembled genome sequences peaks at an unprecedented scale, thereby highlighting the need to make gene functional annotations fast and efficient. However, the (high) quality of such annotations must be guaranteed, as this is the first indicator of the genomic potential of every organism. Automatic procedures help accelerating the annotation process, though decreasing the confidence and reliability of the outcomes. Manually curating a genome-wide annotation of genes, enzymes and transporter proteins function is a highly time-consuming, tedious and impractical task, even for the most proficient curator. Hence, a semi-automated procedure, which balances the two approaches, will increase the reliability of the annotation, while speeding up the process. In fact, a prior analysis of the annotation algorithm may leverage its performance, by manipulating its parameters, hastening the downstream processing and the manual curation of assigning functions to genes encoding proteins.

RESULTS

Here SamPler, a novel strategy to select parameters for gene functional annotation routines is presented. This semi-automated method is based on the manual curation of a randomly selected set of genes/proteins. Then, in a multi-dimensional array, this sample is used to assess the automatic annotations for all possible combinations of the algorithm's parameters. These assessments allow creating an array of confusion matrices, for which several metrics are calculated (accuracy, precision and negative predictive value) and used to reach optimal values for the parameters.

CONCLUSIONS

The potential of this methodology is demonstrated with four genome functional annotations performed in merlin, an in-house user-friendly computational framework for genome-scale metabolic annotation and model reconstruction. For that, SamPler was implemented as a new plugin for the merlin tool.

摘要

背景

随着基因组测序项目的快速发展，具有最近组装基因组序列的生物体的多样性达到了前所未有的规模，因此需要快速有效地进行基因功能注释。然而，必须保证这些注释的（高）质量，因为这是每个生物体基因组潜力的第一个指标。自动程序有助于加速注释过程，尽管降低了结果的置信度和可靠性。手动编目基因、酶和转运蛋白功能的全基因组注释是一项高度耗时、乏味且不切实际的任务，即使对于最熟练的编目人员也是如此。因此，一种平衡两种方法的半自动化程序将提高注释的可靠性，同时加快进程。事实上，通过操纵参数对注释算法进行预先分析，可以提高其性能，从而加快下游处理和为编码蛋白的基因分配功能的手动编目过程。

结果

本文提出了一种新的基因功能注释例程参数选择策略 SamPler。这种半自动方法基于对一组随机选择的基因/蛋白质进行手动编目。然后，在多维数组中，使用该样本评估算法参数的所有可能组合的自动注释。这些评估允许创建一个混淆矩阵数组，其中计算了几个指标（准确性、精度和负预测值），并用于为参数找到最佳值。

结论

该方法的潜力在 Merlin 中进行的四个基因组功能注释中得到了证明，Merlin 是一个用于基因组规模代谢注释和模型重建的用户友好的计算框架。为此，SamPler 被实现为 Merlin 工具的一个新插件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d7b/6727554/64bd63c8239c/12859_2019_3038_Fig1_HTML.jpg

相似文献

SamPler - a novel method for selecting parameters for gene functional annotation routines.SamPler-一种用于选择基因功能注释例程参数的新方法。

BMC Bioinformatics. 2019 Sep 5;20(1):454. doi: 10.1186/s12859-019-3038-4.

A semi-automated genome annotation comparison and integration scheme.一种半自动化的基因组注释比较和整合方案。

BMC Bioinformatics. 2013 Jun 1;14:172. doi: 10.1186/1471-2105-14-172.

Manual Gene Curation and Functional Annotation.手动基因注释与功能注释

Methods Mol Biol. 2018;1775:185-208. doi: 10.1007/978-1-4939-7804-5_16.

Cross-organism learning method to discover new gene functionalities.跨生物学习方法发现新基因功能。

Comput Methods Programs Biomed. 2016 Apr;126:20-34. doi: 10.1016/j.cmpb.2015.12.002. Epub 2015 Dec 17.

CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations.CvManGO，一种利用计算预测来改进基于文献的基因本体论注释的方法。

Database (Oxford). 2012 Mar 20;2012:bas001. doi: 10.1093/database/bas001. Print 2012.

Genome-Wide Semi-Automated Annotation of Transporter Systems.转运体系统的全基因组半自动注释

IEEE/ACM Trans Comput Biol Bioinform. 2017 Mar-Apr;14(2):443-456. doi: 10.1109/TCBB.2016.2527647. Epub 2016 Feb 11.

merlin, an improved framework for the reconstruction of high-quality genome-scale metabolic models. Merlin，一种用于重建高质量基因组规模代谢模型的改进框架。

Nucleic Acids Res. 2022 Jun 24;50(11):6052-6066. doi: 10.1093/nar/gkac459.

CODON-Software to manual curation of prokaryotic genomes.CODON——用于原核生物基因组的人工注释的软件。

PLoS Comput Biol. 2021 Mar 31;17(3):e1008797. doi: 10.1371/journal.pcbi.1008797. eCollection 2021 Mar.

BEACON: automated tool for Bacterial GEnome Annotation ComparisON.BEACON：细菌基因组注释比较自动化工具。

BMC Genomics. 2015 Aug 18;16(1):616. doi: 10.1186/s12864-015-1826-4.

引用本文的文献

merlin, an improved framework for the reconstruction of high-quality genome-scale metabolic models. Merlin，一种用于重建高质量基因组规模代谢模型的改进框架。

Nucleic Acids Res. 2022 Jun 24;50(11):6052-6066. doi: 10.1093/nar/gkac459.

Twelve quick steps for genome assembly and annotation in the classroom.课堂上进行基因组组装和注释的 12 个快速步骤。

PLoS Comput Biol. 2020 Nov 12;16(11):e1008325. doi: 10.1371/journal.pcbi.1008325. eCollection 2020 Nov.

本文引用的文献

Ensembl 2018.Ensembl 2018.

Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. doi: 10.1093/nar/gkx1098.

ORCAN-a web-based meta-server for real-time detection and functional annotation of orthologs.ORCAN——一个用于直系同源基因实时检测和功能注释的基于网络的元服务器。

Bioinformatics. 2017 Apr 15;33(8):1224-1226. doi: 10.1093/bioinformatics/btw825.

KEGG: new perspectives on genomes, pathways, diseases and drugs.京都基因与基因组百科全书（KEGG）：关于基因组、通路、疾病和药物的新视角。

Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361. doi: 10.1093/nar/gkw1092. Epub 2016 Nov 28.

InterPro in 2017-beyond protein family and domain annotations.2017年的InterPro——超越蛋白质家族和结构域注释

Nucleic Acids Res. 2017 Jan 4;45(D1):D190-D199. doi: 10.1093/nar/gkw1107. Epub 2016 Nov 29.

UniProt: the universal protein knowledgebase.通用蛋白质知识库：UniProt

Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169. doi: 10.1093/nar/gkw1099. Epub 2016 Nov 29.

Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements.基因组在线数据库（GOLD）第6版：数据更新与功能增强

Nucleic Acids Res. 2017 Jan 4;45(D1):D446-D456. doi: 10.1093/nar/gkw992. Epub 2016 Oct 27.

MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation.MEGAnnotator：一个用于微生物基因组组装和注释的用户友好型流程。

FEMS Microbiol Lett. 2016 Apr;363(7). doi: 10.1093/femsle/fnw049. Epub 2016 Mar 1.

BEACON: automated tool for Bacterial GEnome Annotation ComparisON.BEACON：细菌基因组注释比较自动化工具。

BMC Genomics. 2015 Aug 18;16(1):616. doi: 10.1186/s12864-015-1826-4.

Reconstructing genome-scale metabolic models with merlin.使用Merlin重建基因组规模的代谢模型。

Nucleic Acids Res. 2015 Apr 30;43(8):3899-910. doi: 10.1093/nar/gkv294. Epub 2015 Apr 6.

Gene Ontology Consortium: going forward.基因本体论联盟：展望未来。

Nucleic Acids Res. 2015 Jan;43(Database issue):D1049-56. doi: 10.1093/nar/gku1179. Epub 2014 Nov 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SamPler-一种用于选择基因功能注释例程参数的新方法。

SamPler - a novel method for selecting parameters for gene functional annotation routines.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献