Suppr超能文献

模式生物数据库的管理准确性。

Curation accuracy of model organism databases.

作者信息

Keseler Ingrid M, Skrzypek Marek, Weerasinghe Deepika, Chen Albert Y, Fulcher Carol, Li Gene-Wei, Lemmer Kimberly C, Mladinich Katherine M, Chow Edmond D, Sherlock Gavin, Karp Peter D

机构信息

Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA.

Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA

出版信息

Database (Oxford). 2014 Jun 12;2014. doi: 10.1093/database/bau058. Print 2014.

Abstract

Manual extraction of information from the biomedical literature-or biocuration-is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org//

摘要

从生物医学文献中手动提取信息——即生物编目——是用于构建许多生物学数据库的核心方法。例如,通用蛋白质数据库(UniProt)、大肠杆菌数据库(EcoCyc)和白色念珠菌基因组数据库(CGD)都是基于生物编目构建的。生命科学研究人员广泛使用生物学数据库,将其作为在线百科全书,作为解释新实验数据的辅助工具,以及作为开发新生物信息学算法的黄金标准。尽管一直认为人工编目高度准确,但我们只知道之前有一项关于生物编目准确性的研究。我们通过在随机选择的EcoCyc和CGD基因页面中手动选择经过编目的断言,然后验证参考文献中找到的数据是否支持这些断言,来评估EcoCyc和CGD的准确性。如果在为某个断言引用的出版物中找不到该断言,则该数据库断言被视为错误。在我们对两个数据库验证的633个事实中,我们发现了10个错误,总体错误率为1.58%,CGD的个别错误率为1.82%,EcoCyc的个别错误率为1.40%。这些数据表明,由博士水平的科学家对实验文献进行人工编目非常准确。数据库网址:http://ecocyc.org/,http://www.candidagenome.org//

相似文献

1
Curation accuracy of model organism databases.模式生物数据库的管理准确性。
Database (Oxford). 2014 Jun 12;2014. doi: 10.1093/database/bau058. Print 2014.
3
EcoCyc: a comprehensive database resource for Escherichia coli.EcoCyc:大肠杆菌的综合数据库资源。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D334-7. doi: 10.1093/nar/gki108.
4
EcoCyc: fusing model organism databases with systems biology.EcoCyc:将模式生物数据库与系统生物学融合。
Nucleic Acids Res. 2013 Jan;41(Database issue):D605-12. doi: 10.1093/nar/gks1027. Epub 2012 Nov 9.
7
The EcoCyc Database (2023).EcoCyc数据库(2023年)。
EcoSal Plus. 2023 Dec 12;11(1):eesp00022023. doi: 10.1128/ecosalplus.esp-0002-2023. Epub 2023 May 11.
8
EcoCyc: a comprehensive view of Escherichia coli biology.《大肠杆菌代谢数据库(EcoCyc):大肠杆菌生物学全景》
Nucleic Acids Res. 2009 Jan;37(Database issue):D464-70. doi: 10.1093/nar/gkn751. Epub 2008 Oct 30.
9
The EcoCyc Database.生态循环数据库。
Nucleic Acids Res. 2002 Jan 1;30(1):56-8. doi: 10.1093/nar/30.1.56.

引用本文的文献

2
The EcoCyc Database (2023).EcoCyc数据库(2023年)。
EcoSal Plus. 2023 Dec 12;11(1):eesp00022023. doi: 10.1128/ecosalplus.esp-0002-2023. Epub 2023 May 11.
6
Towards Knowledge Maintenance in Scientific Digital Libraries with the Keystone Framework.借助关键框架实现科学数字图书馆中的知识维护
Proc ACM/IEEE Joint Conf Digit Libr. 2020 Aug;2020:217-226. doi: 10.1145/3383583.3398514. Epub 2020 Aug 1.
10
Constructing knowledge graphs and their biomedical applications.构建知识图谱及其生物医学应用。
Comput Struct Biotechnol J. 2020 Jun 2;18:1414-1428. doi: 10.1016/j.csbj.2020.05.017. eCollection 2020.

本文引用的文献

3
PubTator: a web-based text mining tool for assisting biocuration.PubTator:一个用于辅助生物注释的基于网络的文本挖掘工具。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.
4
EcoCyc: fusing model organism databases with systems biology.EcoCyc:将模式生物数据库与系统生物学融合。
Nucleic Acids Res. 2013 Jan;41(Database issue):D605-12. doi: 10.1093/nar/gks1027. Epub 2012 Nov 9.
9
Construction and completion of flux balance models from pathway databases.从途径数据库构建和完成通量平衡模型。
Bioinformatics. 2012 Feb 1;28(3):388-96. doi: 10.1093/bioinformatics/btr681. Epub 2012 Jan 18.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验