• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

消费者健康问题的语义标注。

Semantic annotation of consumer health questions.

机构信息

Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD, USA.

出版信息

BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.

DOI:10.1186/s12859-018-2045-1
PMID:29409442
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5802048/
Abstract

BACKGROUND

Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations.

RESULTS

The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most useful in estimating annotation confidence.

CONCLUSIONS

To our knowledge, our corpus is the first focusing on annotation of uncurated consumer health questions. It is currently used to develop machine learning-based methods for question understanding. We make the corpus publicly available to stimulate further research on consumer health QA.

摘要

背景

消费者越来越多地使用在线资源来满足他们的健康信息需求。虽然当前的搜索引擎在一定程度上可以满足这些需求,但它们通常没有考虑到大多数健康信息需求是复杂的,只能用自然语言充分表达。消费者健康问答(QA)系统旨在填补这一空白。开发消费者健康 QA 系统的一个主要挑战是从自然语言问题中提取相关的语义内容(问题理解)。为了开发有效的问题理解工具,需要具有相关问题元素语义注释的问题语料库。在本文中,我们提出了一个两部分的消费者健康问题语料库,该语料库标注了几个语义类别:命名实体、问题触发器/类型、问题框架和问题主题。第一部分(CHQA-email)由美国国家医学图书馆(NLM)客户服务收到的相对较长的电子邮件请求组成,而第二部分(CHQA-web)由向 MedlinePlus 搜索引擎提出的较短问题组成查询。每个问题都由两名注释员进行注释。注释方法在语料库的两部分之间基本相同;然而,我们也解释并证明了它们之间的差异。此外,我们提供了有关语料库特征、注释者间一致性以及在没有注释裁决的情况下尝试测量注释置信度的信息。

结果

最终的语料库由 2614 个问题组成(CHQA-email:1740,CHQA-web:874)。问题是最常见的命名实体,而治疗和一般信息问题是最常见的问题类型。注释者间一致性通常是适度的:问题类型和主题的一致性最高,而更复杂的框架注释的一致性较低。CHQA-web 的一致性始终高于 CHQA-email。成对注释者间一致性在估计注释置信度方面最有用。

结论

据我们所知,我们的语料库是第一个专注于未经过滤的消费者健康问题注释的语料库。它目前用于开发基于机器学习的问题理解方法。我们将语料库公开提供,以激发对消费者健康 QA 的进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/c8ee4a0ae320/12859_2018_2045_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/d6c5f9a93e88/12859_2018_2045_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/6a1aeadd2f31/12859_2018_2045_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/de27b5b83049/12859_2018_2045_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/e90b84584941/12859_2018_2045_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/c5f094f98b8b/12859_2018_2045_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/c8ee4a0ae320/12859_2018_2045_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/d6c5f9a93e88/12859_2018_2045_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/6a1aeadd2f31/12859_2018_2045_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/de27b5b83049/12859_2018_2045_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/e90b84584941/12859_2018_2045_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/c5f094f98b8b/12859_2018_2045_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/43ff/5802048/c8ee4a0ae320/12859_2018_2045_Fig6_HTML.jpg

相似文献

1
Semantic annotation of consumer health questions.消费者健康问题的语义标注。
BMC Bioinformatics. 2018 Feb 6;19(1):34. doi: 10.1186/s12859-018-2045-1.
2
Qcorp: an annotated classification corpus of Chinese health questions.Qcorp:一个带注释的中文健康问题分类语料库。
BMC Med Inform Decis Mak. 2018 Mar 22;18(Suppl 1):16. doi: 10.1186/s12911-018-0593-y.
3
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
4
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.构建中文临床文本的综合句法和语义语料库。
J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.
5
Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.PubMed 查询的半自动语义标注:一项关于质量、效率和满意度的研究。
J Biomed Inform. 2011 Apr;44(2):310-8. doi: 10.1016/j.jbi.2010.11.001. Epub 2010 Nov 20.
6
SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks.SemClinBr - 一个用于葡萄牙语临床自然语言处理任务的多机构和多专业的语义注释语料库。
J Biomed Semantics. 2022 May 8;13(1):13. doi: 10.1186/s13326-022-00269-1.
7
An annotated corpus of clinical trial publications supporting schema-based relational information extraction.支持基于模式的关系信息抽取的临床试验文献标注语料库。
J Biomed Semantics. 2022 May 23;13(1):14. doi: 10.1186/s13326-022-00271-7.
8
Assessment of disease named entity recognition on a corpus of annotated sentences.基于带注释句子语料库的疾病命名实体识别评估。
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.
9
Consumer health information and question answering: helping consumers find answers to their health-related information needs.消费者健康信息与问答:帮助消费者寻找与其健康相关的信息需求的答案。
J Am Med Inform Assoc. 2020 Feb 1;27(2):194-201. doi: 10.1093/jamia/ocz152.
10
SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.SemBioNLQA:一个语义生物医学问答系统,用于检索自然语言问题的准确和理想答案。
Artif Intell Med. 2020 Jan;102:101767. doi: 10.1016/j.artmed.2019.101767. Epub 2019 Nov 28.

引用本文的文献

1
Semantic classification of Indonesian consumer health questions.印度尼西亚消费者健康问题的语义分类。
J Biomed Semantics. 2025 Jul 28;16(1):13. doi: 10.1186/s13326-025-00334-5.
2
LLM enabled classification of patient self-reported symptoms and needs in health systems across the USA.基于大语言模型对美国各地卫生系统中患者自我报告的症状和需求进行分类。
NPJ Digit Med. 2025 Jul 1;8(1):390. doi: 10.1038/s41746-025-01779-9.
3
A Dataset of Medical Questions Paired with Automatically Generated Answers and Evidence-supported References.一个包含医学问题以及自动生成答案和有证据支持的参考文献的数据集。

本文引用的文献

1
Annotating Logical Forms for EHR Questions.为电子健康记录问题标注逻辑形式
LREC Int Conf Lang Resour Eval. 2016 May;2016:3772-3778.
2
Combining Open-domain and Biomedical Knowledge for Topic Recognition in Consumer Health Questions.结合开放域知识与生物医学知识用于消费者健康问题中的主题识别
AMIA Annu Symp Proc. 2017 Feb 10;2016:914-923. eCollection 2016.
3
Interactive use of online health resources: a comparison of consumer and professional questions.在线健康资源的交互使用:消费者问题与专业问题的比较
Sci Data. 2025 Jun 19;12(1):1035. doi: 10.1038/s41597-025-05233-z.
4
CHQ- SocioEmo: Identifying Social and Emotional Support Needs in Consumer-Health Questions.CHQ- 社会情感:识别消费者健康问题中的社会和情感支持需求。
Sci Data. 2023 May 27;10(1):329. doi: 10.1038/s41597-023-02203-1.
5
Classifying unstructured electronic consult messages to understand primary care physician specialty information needs.对非结构化电子咨询信息进行分类,以了解初级保健医生的专业信息需求。
J Am Med Inform Assoc. 2022 Aug 16;29(9):1607-1617. doi: 10.1093/jamia/ocac092.
6
Question-aware transformer models for consumer health question summarization.基于问句感知的 Transformer 模型在消费者健康问句总结中的应用
J Biomed Inform. 2022 Apr;128:104040. doi: 10.1016/j.jbi.2022.104040. Epub 2022 Mar 6.
7
An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource.一种使用GloVe词嵌入和辅助词汇资源来丰富消费者健康词汇表的自动化方法。
PeerJ Comput Sci. 2021 Aug 9;7:e668. doi: 10.7717/peerj-cs.668. eCollection 2021.
8
Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?生物多样性研究中的数据集搜索:数据存储库中的元数据是否反映了学术信息需求?
PLoS One. 2021 Mar 24;16(3):e0246099. doi: 10.1371/journal.pone.0246099. eCollection 2021.
9
Multidimensional Feature Classification of the Health Information Needs of Patients With Hypertension in an Online Health Community Through Analysis of 1000 Patient Question Records: Observational Study.通过分析1000条患者问题记录对在线健康社区中高血压患者健康信息需求进行多维特征分类:观察性研究
J Med Internet Res. 2020 May 29;22(5):e17349. doi: 10.2196/17349.
10
Patient Questions and Physician Responses in a Chinese Health Q&A Website: Content Analysis.中国健康问答网站上的患者问题与医生回复:内容分析
J Med Internet Res. 2020 Apr 16;22(4):e13071. doi: 10.2196/13071.
J Am Med Inform Assoc. 2016 Jul;23(4):802-11. doi: 10.1093/jamia/ocw024. Epub 2016 May 4.
4
An Ensemble Method for Spelling Correction in Consumer Health Questions.一种用于消费者健康问题拼写纠正的集成方法。
AMIA Annu Symp Proc. 2015 Nov 5;2015:727-36. eCollection 2015.
5
Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.生物共指消解评分系统(Bio-SCoRes):一种用于生物医学文本共指消解的混合架构
PLoS One. 2016 Mar 2;11(3):e0148538. doi: 10.1371/journal.pone.0148538. eCollection 2016.
6
Automatically classifying question types for consumer health questions.自动对消费者健康问题的问题类型进行分类。
AMIA Annu Symp Proc. 2014 Nov 14;2014:1018-27. eCollection 2014.
7
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.BIOASQ大规模生物医学语义索引与问答竞赛概述。
BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.
8
The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.
9
Biomedical question answering using semantic relations.基于语义关系的生物医学问答
BMC Bioinformatics. 2015 Jan 16;16(1):6. doi: 10.1186/s12859-014-0365-3.
10
Clinical questions raised by clinicians at the point of care: a systematic review.临床医生在护理点提出的临床问题:系统评价。
JAMA Intern Med. 2014 May;174(5):710-8. doi: 10.1001/jamainternmed.2014.368.