• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PubMed 查询的半自动语义标注:一项关于质量、效率和满意度的研究。

Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.

机构信息

National Center for Biotechnology Information, US National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

J Biomed Inform. 2011 Apr;44(2):310-8. doi: 10.1016/j.jbi.2010.11.001. Epub 2010 Nov 20.

DOI:10.1016/j.jbi.2010.11.001
PMID:21094696
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3063330/
Abstract

Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations.

摘要

信息处理算法需要大量经过注释的数据进行训练和测试。然而,这种数据的可用性往往受到数据生产的复杂性和高成本的限制。在本文中,我们研究了一种最先进的工具,以帮助对大量生物医学查询进行语义注释。我们招募了 7 名注释者对 10000 个 PubMed®查询进行注释,涵盖 16 个生物医学和书目类别。约一半的查询是从头开始注释的,而另一半则是自动预注释和手动纠正的。评估了自动预注释在任务的几个方面的影响:时间、操作数量、注释者满意度、注释者间一致性、注释质量和数量。分析注释结果表明,使用自动工具的预注释结果可将所需的人工注释数量减少 28.9%。因此,使用预注释可以显著减少整体注释时间,同时显著提高注释者间的一致性。此外,使用预注释不会导致语义分布或生成的注释数量产生统计学上的显著差异。已注释的查询语料库可供研究界自由使用。这项研究表明,大多数注释者认为自动预注释是有帮助的。我们的经验表明,使用自动工具可以辅助大规模的手动注释项目。这有助于加快注释时间,提高注释的一致性,同时保持最终注释的高质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/50a1a30c162f/nihms-256146-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/53cc27611d2d/nihms-256146-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/c4670098d85f/nihms-256146-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/e4c89cc0560e/nihms-256146-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/06531b0763ff/nihms-256146-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/50a1a30c162f/nihms-256146-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/53cc27611d2d/nihms-256146-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/c4670098d85f/nihms-256146-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/e4c89cc0560e/nihms-256146-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/06531b0763ff/nihms-256146-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5df0/3063330/50a1a30c162f/nihms-256146-f0005.jpg

相似文献

1
Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.PubMed 查询的半自动语义标注:一项关于质量、效率和满意度的研究。
J Biomed Inform. 2011 Apr;44(2):310-8. doi: 10.1016/j.jbi.2010.11.001. Epub 2010 Nov 20.
2
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
3
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.用于生物医学概念识别的多语言金标准语料库:Mantra GSC。
J Am Med Inform Assoc. 2015 Sep;22(5):948-56. doi: 10.1093/jamia/ocv037. Epub 2015 May 6.
4
RysannMD: A biomedical semantic annotator balancing speed and accuracy.RysannMD:一款兼顾速度与准确性的生物医学语义注释工具。
J Biomed Inform. 2017 Jul;71:91-109. doi: 10.1016/j.jbi.2017.05.016. Epub 2017 May 26.
5
Assisted annotation of medical free text using RapTAT.使用 RapTAT 辅助医学自由文本的注释。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):833-41. doi: 10.1136/amiajnl-2013-002255. Epub 2014 Jan 15.
6
Community annotation experiment for ground truth generation for the i2b2 medication challenge.社区注释实验,为 i2b2 药物挑战赛生成真实数据。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):519-23. doi: 10.1136/jamia.2010.004200.
7
Semantator: semantic annotator for converting biomedical text to linked data.Semantator:将生物医学文本转换为链接数据的语义标注器。
J Biomed Inform. 2013 Oct;46(5):882-93. doi: 10.1016/j.jbi.2013.07.003. Epub 2013 Jul 15.
8
New directions in biomedical text annotation: definitions, guidelines and corpus construction.生物医学文本注释的新方向:定义、指南与语料库构建
BMC Bioinformatics. 2006 Jul 25;7:356. doi: 10.1186/1471-2105-7-356.
9
Semantic annotation of consumer health questions.消费者健康问题的语义标注。
BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.
10
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。
J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.

引用本文的文献

1
Research on the proximity relationships of psychosomatic disease knowledge graph modules extracted by large language models.大语言模型提取的心身疾病知识图谱模块的邻近关系研究。
Sci Rep. 2025 Jul 1;15(1):20653. doi: 10.1038/s41598-025-05499-8.
2
Adverse drug event detection using natural language processing: A scoping review of supervised learning methods.基于自然语言处理的药物不良反应检测:监督学习方法的范围综述。
PLoS One. 2023 Jan 3;18(1):e0279842. doi: 10.1371/journal.pone.0279842. eCollection 2023.
3
AnthraxKP: a knowledge graph-based, Anthrax Knowledge Portal mined from biomedical literature.

本文引用的文献

1
Understanding PubMed user search behavior through log analysis.通过日志分析了解PubMed用户的搜索行为。
Database (Oxford). 2009;2009:bap018. doi: 10.1093/database/bap018. Epub 2009 Nov 27.
2
Author Name Disambiguation in MEDLINE.医学在线数据库(MEDLINE)中的作者姓名消歧
ACM Trans Knowl Discov Data. 2009 Jul 1;3(3). doi: 10.1145/1552303.1552304.
3
Reflect: augmented browsing for the life scientist.反思:为生命科学家提供的增强型浏览
炭疽病知识图谱:基于知识图谱的炭疽病知识库,从生物医学文献中挖掘而来。
Database (Oxford). 2022 Jun 2;2022. doi: 10.1093/database/baac037.
4
Identification of Chemical-Disease Associations Through Integration of Molecular Fingerprint, Gene Ontology and Pathway Information.通过整合分子指纹、基因本体和通路信息鉴定化学-疾病关联。
Interdiscip Sci. 2022 Sep;14(3):683-696. doi: 10.1007/s12539-022-00511-5. Epub 2022 Apr 7.
5
Constructing knowledge graphs and their biomedical applications.构建知识图谱及其生物医学应用。
Comput Struct Biotechnol J. 2020 Jun 2;18:1414-1428. doi: 10.1016/j.csbj.2020.05.017. eCollection 2020.
6
Chemical-induced disease relation extraction via attention-based distant supervision.基于注意力的远程监督的化学诱导疾病关系抽取。
BMC Bioinformatics. 2019 Jul 22;20(1):403. doi: 10.1186/s12859-019-2884-4.
7
Knowledge-guided convolutional networks for chemical-disease relation extraction.知识引导的卷积神经网络用于化学-疾病关系抽取。
BMC Bioinformatics. 2019 May 21;20(1):260. doi: 10.1186/s12859-019-2873-7.
8
A document level neural model integrated domain knowledge for chemical-induced disease relations.基于文档级别的神经模型集成了领域知识用于化学诱导疾病关系。
BMC Bioinformatics. 2018 Sep 17;19(1):328. doi: 10.1186/s12859-018-2316-x.
9
OC-2-KB: integrating crowdsourcing into an obesity and cancer knowledge base curation system.OC-2-KB:将众包集成到肥胖和癌症知识库策管系统中。
BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):55. doi: 10.1186/s12911-018-0635-5.
10
Towards PubMed 2.0.迈向 PubMed 2.0。
Elife. 2017 Oct 30;6:e28801. doi: 10.7554/eLife.28801.
Nat Biotechnol. 2009 Jun;27(6):508-10. doi: 10.1038/nbt0609-508.
4
How to get the most out of your curation effort.如何充分利用你的策划工作。
PLoS Comput Biol. 2009 May;5(5):e1000391. doi: 10.1371/journal.pcbi.1000391. Epub 2009 May 22.
5
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?文本中的事实:文本挖掘能否助力利用本体对基因产物进行大规模高质量人工编目?
Brief Bioinform. 2008 Nov;9(6):466-78. doi: 10.1093/bib/bbn043. Epub 2008 Dec 6.
6
Overview of BioCreative II gene normalization.生物创意II基因标准化概述。
Genome Biol. 2008;9 Suppl 2(Suppl 2):S3. doi: 10.1186/gb-2008-9-s2-s3. Epub 2008 Sep 1.
7
Overview of BioCreative II gene mention recognition.生物创意II基因提及识别概述。
Genome Biol. 2008;9 Suppl 2(Suppl 2):S2. doi: 10.1186/gb-2008-9-s2-s2. Epub 2008 Sep 1.
8
Automated analysis of viral integration sites in gene therapy research using the SeqMap web resource.使用SeqMap网络资源对基因治疗研究中的病毒整合位点进行自动化分析。
Gene Ther. 2008 Sep;15(18):1294-8. doi: 10.1038/gt.2008.99. Epub 2008 Jun 26.
9
Assisted curation: does text mining really help?辅助编目:文本挖掘真的有帮助吗?
Pac Symp Biocomput. 2008:556-67.
10
A day in the life of PubMed: analysis of a typical day's query log.《医学期刊数据库(PubMed)一天的使用情况:典型一天的查询日志分析》
J Am Med Inform Assoc. 2007 Mar-Apr;14(2):212-20. doi: 10.1197/jamia.M2191. Epub 2007 Jan 9.