Suppr超能文献

使用两种文档类型构建语义标注的慢性病并发症语料库。

Building a semantically annotated corpus for chronic disease complications using two document types.

机构信息

Department of Computer Science and Engineering, Royal Commission for Jubail and Yanbu, Yanbu University College, Yanbu Industrial City, Saudi Arabia.

出版信息

PLoS One. 2021 Mar 18;16(3):e0247319. doi: 10.1371/journal.pone.0247319. eCollection 2021.

Abstract

Narrative information in electronic health records (EHRs) contains a wealth of information related to patient health conditions. In addition, people use Twitter to express their experiences regarding personal health issues, such as medical complaints, symptoms, treatments, lifestyle, and other factors. Both genres of text include different types of health-related information concerning disease complications and risk factors. Knowing detailed information about controlling disease risk factors has a great impact on modifying these risks and subsequently preventing disease complications. Text-mining tools provide efficient solutions to extract and integrate vital information related to disease complications hidden in the large volume of the narrative text. However, the development of text-mining tools depends on the availability of an annotated corpus. In response, we have developed the PrevComp corpus, which is annotated with information relevant to the identification of disease complications, underlying risk factors, and prevention measures, in the context of the interaction between hypertension and diabetes. The corpus is unique and novel in terms of the very specific topic in the biomedical domain and as an integration of information from both EHRs and tweets collected from Twitter. The annotation scheme was designed with guidance by a domain expert, and two further domain experts performed the annotation, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.60 and 0.75 for EHRs and tweets, respectively.

摘要

电子健康记录 (EHR) 中的叙述信息包含与患者健康状况相关的大量信息。此外,人们使用 Twitter 表达个人健康问题的经历,例如医疗投诉、症状、治疗、生活方式和其他因素。这两种文本类型都包含有关疾病并发症和风险因素的不同类型的健康相关信息。了解有关控制疾病风险因素的详细信息对修改这些风险并随后预防疾病并发症有很大影响。文本挖掘工具提供了从大量叙述性文本中提取和整合与疾病并发症相关的重要信息的有效解决方案。然而,文本挖掘工具的开发取决于注释语料库的可用性。有鉴于此,我们开发了 PrevComp 语料库,该语料库在高血压和糖尿病相互作用的背景下,对识别疾病并发症、潜在风险因素和预防措施的相关信息进行了注释。该语料库在生物医学领域的非常特定主题以及从 EHR 和从 Twitter 收集的推文信息的整合方面是独特而新颖的。注释方案由领域专家指导设计,另外两位领域专家进行了注释,从而得到了高质量的注释,EHR 和推文的准确率 F 分数分别高达 0.60 和 0.75。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b02/7971867/ee7921758e31/pone.0247319.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验