Suppr超能文献

诱导词汇句法模式可提高从在线医疗论坛中提取信息的能力。

Induced lexico-syntactic patterns improve information extraction from online medical forums.

机构信息

Department of Computer Science, Stanford University, Stanford, California, USA.

出版信息

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):902-9. doi: 10.1136/amiajnl-2014-002669. Epub 2014 Jun 26.

Abstract

OBJECTIVE

To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries.

BACKGROUND AND SIGNIFICANCE

Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care.

MATERIALS AND METHODS

We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms.

RESULTS

Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58-70% F1 score and SC terms with 66-76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach.

CONCLUSIONS

Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries.

摘要

目的

通过从带有种子词典注释的数据中学习词汇句法模式,从患者撰写的文本(PAT)中可靠地提取两个实体类型,症状和疾病(SCs)和药物和治疗(DTs)。

背景和意义

尽管 PAT 的数量不断增加(例如,在线讨论线程),但用于识别 PAT 中的医学实体的工具却很有限。当应用于 PAT 时,现有的工具要么无法识别特定的实体类型,要么性能不佳。在 PAT 中识别 SC 和 DT 术语将能够探索不仅药物,而且家庭疗法和日常护理成分的疗效和副作用。

材料和方法

我们使用从在线来源编译的 SC 和 DT 术语词典来标记来自 MedHelp(http://www.medhelp.org)的几个讨论论坛。然后,我们迭代地归纳出与每个实体类型强相关的词汇句法模式,以提取新的 SC 和 DT 术语。

结果

我们的系统能够提取原始词典中没有的症状描述和治疗方法,例如'LADA','刺痛'和'肉桂丸'。我们的系统在 MedHelp 的两个论坛上提取 DT 术语的 F1 分数为 58-70%,SC 术语的 F1 分数为 66-76%。与 MetaMap,OBA,基于条件随机场的分类器和以前的模式学习方法相比,我们取得了改进。

结论

我们基于词汇句法模式的实体提取器是识别 PAT 中特定实体类型的成功且优选的技术。据我们所知,这是第一篇从 PAT 中提取 SC 和 DT 实体的论文。我们展示了对 PAT 中常用但典型词典中缺失的非正式术语的学习。

相似文献

10

引用本文的文献

8
Feature engineering for sentiment analysis in e-health forums.电子健康论坛中的情感分析的特征工程。
PLoS One. 2018 Nov 29;13(11):e0207996. doi: 10.1371/journal.pone.0207996. eCollection 2018.

本文引用的文献

3
Web-scale pharmacovigilance: listening to signals from the crowd.网络规模药物警戒:从人群中聆听信号。
J Am Med Inform Assoc. 2013 May 1;20(3):404-8. doi: 10.1136/amiajnl-2012-001482. Epub 2013 Mar 6.
4
When Google got flu wrong.当谷歌在流感预测上出错时。
Nature. 2013 Feb 14;494(7436):155-6. doi: 10.1038/494155a.
8
The open biomedical annotator.开放式生物医学注释工具
Summit Transl Bioinform. 2009 Mar 1;2009:56-60.
9
An overview of MetaMap: historical perspective and recent advances.MetaMap 概述:历史视角与最新进展。
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验