• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用科学文献自动评估生物数据库断言。

Automated assessment of biological database assertions using the scientific literature.

机构信息

Department of Mechanical & Industrial Engineering, University of Toronto, Toronto, M5S 3G8, Canada.

School of Computing and Information Systems, University of Melbourne, Melbourne, 3010, Australia.

出版信息

BMC Bioinformatics. 2019 Apr 29;20(1):216. doi: 10.1186/s12859-019-2801-x.

DOI:10.1186/s12859-019-2801-x
PMID:31035936
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6489365/
Abstract

BACKGROUND

The large biological databases such as GenBank contain vast numbers of records, the content of which is substantively based on external resources, including published literature. Manual curation is used to establish whether the literature and the records are indeed consistent. We explore in this paper an automated method for assessing the consistency of biological assertions, to assist biocurators, which we call BARC, Biocuration tool for Assessment of Relation Consistency. In this method a biological assertion is represented as a relation between two objects (for example, a gene and a disease); we then use our novel set-based relevance algorithm SaBRA to retrieve pertinent literature, and apply a classifier to estimate the likelihood that this relation (assertion) is correct.

RESULTS

Our experiments on assessing gene-disease relations and protein-protein interactions using the PubMed Central collection show that BARC can be effective at assisting curators to perform data cleansing. Specifically, the results obtained showed that BARC substantially outperforms the best baselines, with an improvement of F-measure of 3.5% and 13%, respectively, on gene-disease relations and protein-protein interactions. We have additionally carried out a feature analysis that showed that all feature types are informative, as are all fields of the documents.

CONCLUSIONS

BARC provides a clear benefit for the biocuration community, as there are no prior automated tools for identifying inconsistent assertions in large-scale biological databases.

摘要

背景

GenBank 等大型生物数据库包含大量记录,其内容主要基于外部资源,包括已发表的文献。手动注释用于确定文献和记录是否确实一致。我们在本文中探索了一种自动评估生物断言一致性的方法,以协助生物注释者,我们称之为 BARC,用于评估关系一致性的生物注释工具。在这种方法中,生物断言表示为两个对象(例如,基因和疾病)之间的关系;然后,我们使用我们新颖的基于集合的相关性算法 SaBRA 来检索相关文献,并应用分类器来估计这种关系(断言)正确的可能性。

结果

我们使用 PubMed Central 集合评估基因-疾病关系和蛋白质-蛋白质相互作用的实验表明,BARC 可以有效地帮助注释者执行数据清理。具体来说,结果表明 BARC 大大优于最佳基线,在基因-疾病关系和蛋白质-蛋白质相互作用方面,F 值分别提高了 3.5%和 13%。我们还进行了特征分析,表明所有特征类型都是信息丰富的,文档的所有字段也是信息丰富的。

结论

BARC 为生物注释社区提供了明显的好处,因为在大型生物数据库中没有用于识别不一致断言的先前自动化工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/ede4acb9091c/12859_2019_2801_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/95ddd24906d6/12859_2019_2801_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/6cfdcfc9703f/12859_2019_2801_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/e401ade6ba1f/12859_2019_2801_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/ace85aaad84d/12859_2019_2801_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/aa94f3fce009/12859_2019_2801_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/be155dce1e82/12859_2019_2801_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/ebd55b983ba5/12859_2019_2801_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/5c5e750dcea4/12859_2019_2801_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/c4b281f24797/12859_2019_2801_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/ede4acb9091c/12859_2019_2801_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/95ddd24906d6/12859_2019_2801_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/6cfdcfc9703f/12859_2019_2801_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/e401ade6ba1f/12859_2019_2801_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/ace85aaad84d/12859_2019_2801_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/aa94f3fce009/12859_2019_2801_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/be155dce1e82/12859_2019_2801_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/ebd55b983ba5/12859_2019_2801_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/5c5e750dcea4/12859_2019_2801_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/c4b281f24797/12859_2019_2801_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbbd/6489365/ede4acb9091c/12859_2019_2801_Fig10_HTML.jpg

相似文献

1
Automated assessment of biological database assertions using the scientific literature.利用科学文献自动评估生物数据库断言。
BMC Bioinformatics. 2019 Apr 29;20(1):216. doi: 10.1186/s12859-019-2801-x.
2
Automated detection of records in biological sequence databases that are inconsistent with the literature.自动检测生物序列数据库中与文献不一致的记录。
J Biomed Inform. 2017 Jul;71:229-240. doi: 10.1016/j.jbi.2017.06.015. Epub 2017 Jun 15.
3
PubTator: a web-based text mining tool for assisting biocuration.PubTator:一个用于辅助生物注释的基于网络的文本挖掘工具。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.
4
BioReader: a text mining tool for performing classification of biomedical literature.BioReader:一种文本挖掘工具,用于对生物医学文献进行分类。
BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):57. doi: 10.1186/s12859-019-2607-x.
5
Curation accuracy of model organism databases.模式生物数据库的管理准确性。
Database (Oxford). 2014 Jun 12;2014. doi: 10.1093/database/bau058. Print 2014.
6
Integrating image caption information into biomedical document classification in support of biocuration.将图像标题信息整合到生物医学文献分类中,以支持生物注释。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa024.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Text mining for the biocuration workflow.文本挖掘在生物注释工作流中的应用。
Database (Oxford). 2012 Apr 18;2012:bas020. doi: 10.1093/database/bas020. Print 2012.
9
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述:精准医学中的蛋白质相互作用和突变挖掘。
Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.
10
Accelerating annotation of articles via automated approaches: evaluation of the neXtA5 curation-support tool by neXtProt.通过自动化方法加速文章注释:NextProt 对 neXtA5 内容管理支持工具的评估。
Database (Oxford). 2018 Jan 1;2018:bay129. doi: 10.1093/database/bay129.

本文引用的文献

1
BioCreative VI Precision Medicine Track system performance is constrained by entity recognition and variations in corpus characteristics.生物创意 VI 精准医疗轨道系统的性能受到实体识别和语料库特征变化的限制。
Database (Oxford). 2018 Jan 1;2018:bay122. doi: 10.1093/database/bay122.
2
Exploiting graph kernels for high performance biomedical relation extraction.利用图核进行高性能生物医学关系提取。
J Biomed Semantics. 2018 Jan 30;9(1):7. doi: 10.1186/s13326-017-0168-3.
3
Multi-field query expansion is effective for biomedical dataset retrieval.
多字段查询扩展对生物医学数据集检索有效。
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax062.
4
Triage by ranking to support the curation of protein interactions.通过排名进行分类以支持蛋白质相互作用的整理。
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax040.
5
Using uncertainty to link and rank evidence from biomedical literature for model curation.利用不确定性将生物医学文献中的证据进行链接和排序,以用于模型编纂。
Bioinformatics. 2017 Dec 1;33(23):3784-3792. doi: 10.1093/bioinformatics/btx466.
6
On expert curation and scalability: UniProtKB/Swiss-Prot as a case study.关于专业策展和可扩展性:以 UniProtKB/Swiss-Prot 为例。
Bioinformatics. 2017 Nov 1;33(21):3454-3460. doi: 10.1093/bioinformatics/btx439.
7
Automated detection of records in biological sequence databases that are inconsistent with the literature.自动检测生物序列数据库中与文献不一致的记录。
J Biomed Inform. 2017 Jul;71:229-240. doi: 10.1016/j.jbi.2017.06.015. Epub 2017 Jun 15.
8
Literature consistency of bioinformatics sequence databases is effective for assessing record quality.生物信息学序列数据库的文献一致性对于评估记录质量是有效的。
Database (Oxford). 2017 Jan 1;2017(1). doi: 10.1093/database/bax021.
9
Extracting microRNA-gene relations from biomedical literature using distant supervision.利用远程监督从生物医学文献中提取微小RNA-基因关系。
PLoS One. 2017 Mar 6;12(3):e0171929. doi: 10.1371/journal.pone.0171929. eCollection 2017.
10
Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study.主要核苷酸数据库中的重复、冗余和不一致性:一项描述性研究。
Database (Oxford). 2017 Jan 10;2017. doi: 10.1093/database/baw163. Print 2017.