• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

诱导词汇句法模式可提高从在线医疗论坛中提取信息的能力。

Induced lexico-syntactic patterns improve information extraction from online medical forums.

机构信息

Department of Computer Science, Stanford University, Stanford, California, USA.

出版信息

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):902-9. doi: 10.1136/amiajnl-2014-002669. Epub 2014 Jun 26.

DOI:10.1136/amiajnl-2014-002669
PMID:24970840
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4147618/
Abstract

OBJECTIVE

To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries.

BACKGROUND AND SIGNIFICANCE

Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care.

MATERIALS AND METHODS

We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms.

RESULTS

Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58-70% F1 score and SC terms with 66-76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach.

CONCLUSIONS

Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries.

摘要

目的

通过从带有种子词典注释的数据中学习词汇句法模式,从患者撰写的文本(PAT)中可靠地提取两个实体类型,症状和疾病(SCs)和药物和治疗(DTs)。

背景和意义

尽管 PAT 的数量不断增加(例如,在线讨论线程),但用于识别 PAT 中的医学实体的工具却很有限。当应用于 PAT 时,现有的工具要么无法识别特定的实体类型,要么性能不佳。在 PAT 中识别 SC 和 DT 术语将能够探索不仅药物,而且家庭疗法和日常护理成分的疗效和副作用。

材料和方法

我们使用从在线来源编译的 SC 和 DT 术语词典来标记来自 MedHelp(http://www.medhelp.org)的几个讨论论坛。然后,我们迭代地归纳出与每个实体类型强相关的词汇句法模式,以提取新的 SC 和 DT 术语。

结果

我们的系统能够提取原始词典中没有的症状描述和治疗方法,例如'LADA','刺痛'和'肉桂丸'。我们的系统在 MedHelp 的两个论坛上提取 DT 术语的 F1 分数为 58-70%,SC 术语的 F1 分数为 66-76%。与 MetaMap,OBA,基于条件随机场的分类器和以前的模式学习方法相比,我们取得了改进。

结论

我们基于词汇句法模式的实体提取器是识别 PAT 中特定实体类型的成功且优选的技术。据我们所知,这是第一篇从 PAT 中提取 SC 和 DT 实体的论文。我们展示了对 PAT 中常用但典型词典中缺失的非正式术语的学习。

相似文献

1
Induced lexico-syntactic patterns improve information extraction from online medical forums.诱导词汇句法模式可提高从在线医疗论坛中提取信息的能力。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):902-9. doi: 10.1136/amiajnl-2014-002669. Epub 2014 Jun 26.
2
Identifying medical terms in patient-authored text: a crowdsourcing-based approach.识别患者撰写文本中的医学术语:基于众包的方法。
J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1120-7. doi: 10.1136/amiajnl-2012-001110. Epub 2013 May 5.
3
medExtractR: A targeted, customizable approach to medication extraction from electronic health records.medExtractR:一种从电子健康记录中提取药物信息的针对性、可定制方法。
J Am Med Inform Assoc. 2020 Mar 1;27(3):407-418. doi: 10.1093/jamia/ocz207.
4
Assessment of disease named entity recognition on a corpus of annotated sentences.基于带注释句子语料库的疾病命名实体识别评估。
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.
5
Extracting entities with attributes in clinical text via joint deep learning.通过联合深度学习从临床文本中提取具有属性的实体。
J Am Med Inform Assoc. 2019 Dec 1;26(12):1584-1591. doi: 10.1093/jamia/ocz158.
6
Mining adverse drug reactions from online healthcare forums using hidden Markov model.使用隐马尔可夫模型从在线医疗论坛中挖掘药物不良反应
BMC Med Inform Decis Mak. 2014 Oct 23;14:91. doi: 10.1186/1472-6947-14-91.
7
What Online Communities Can Tell Us About Electronic Cigarettes and Hookah Use: A Study Using Text Mining and Visualization Techniques.在线社区能告诉我们关于电子烟和水烟使用的哪些信息:一项运用文本挖掘和可视化技术的研究。
J Med Internet Res. 2015 Sep 29;17(9):e220. doi: 10.2196/jmir.4517.
8
Health Effects Associated With Electronic Cigarette Use: Automated Mining of Online Forums.与电子烟使用相关的健康影响:在线论坛的自动挖掘
J Med Internet Res. 2020 Jan 3;22(1):e15684. doi: 10.2196/15684.
9
A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data.基于电子患者自报告文本数据的症状自然语言处理和文本挖掘的系统评价。
Int J Med Inform. 2019 May;125:37-46. doi: 10.1016/j.ijmedinf.2019.02.008. Epub 2019 Feb 20.
10
Improving dictionary-based named entity recognition with deep learning.利用深度学习改进基于字典的命名实体识别。
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii45-ii52. doi: 10.1093/bioinformatics/btae402.

引用本文的文献

1
Parsable Clinical Trial Eligibility Criteria Representation Using Natural Language Processing.基于自然语言处理的可解析临床试验入选标准表示方法
AMIA Annu Symp Proc. 2023 Apr 29;2022:616-624. eCollection 2022.
2
Identifying Nonpatient Authors of Patient Portal Secure Messages in Oncology: A Proof-of-Concept Demonstration of Natural Language Processing Methods.识别肿瘤学中患者门户安全消息的非患者作者:自然语言处理方法的概念验证演示。
JCO Clin Cancer Inform. 2022 Dec;6:e2200071. doi: 10.1200/CCI.22.00071.
3
Pandemic tele-smart: a contactless tele-health system for efficient monitoring of remotely located COVID-19 quarantine wards in India using near-field communication and natural language processing system.大流行智能远程医疗:利用近场通信和自然语言处理系统对印度远程 COVID-19 隔离病房进行高效监测的无接触远程医疗系统。
Med Biol Eng Comput. 2022 Jan;60(1):61-79. doi: 10.1007/s11517-021-02456-1. Epub 2021 Oct 27.
4
Spelling Errors and Shouting Capitalization Lead to Additive Penalties to Trustworthiness of Online Health Information: Randomized Experiment With Laypersons.拼写错误和全大写形式会增加在线健康信息的不可信度:针对非专业人士的随机实验
J Med Internet Res. 2020 Jun 10;22(6):e15171. doi: 10.2196/15171.
5
Mapping anatomical related entities to human body parts based on wikipedia in discharge summaries.基于维基百科在出院小结中对解剖相关实体与人体部位的映射。
BMC Bioinformatics. 2019 Aug 17;20(1):430. doi: 10.1186/s12859-019-3005-0.
6
A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data.基于电子患者自报告文本数据的症状自然语言处理和文本挖掘的系统评价。
Int J Med Inform. 2019 May;125:37-46. doi: 10.1016/j.ijmedinf.2019.02.008. Epub 2019 Feb 20.
7
Beyond opinion classification: Extracting facts, opinions and experiences from health forums.超越观点分类:从健康论坛中提取事实、观点和经验。
PLoS One. 2019 Jan 9;14(1):e0209961. doi: 10.1371/journal.pone.0209961. eCollection 2019.
8
Feature engineering for sentiment analysis in e-health forums.电子健康论坛中的情感分析的特征工程。
PLoS One. 2018 Nov 29;13(11):e0207996. doi: 10.1371/journal.pone.0207996. eCollection 2018.
9
Machine learning to support social media empowered patients in cancer care and cancer treatment decisions.机器学习支持社交媒体赋能的癌症患者在癌症护理和治疗决策中。
PLoS One. 2018 Oct 18;13(10):e0205855. doi: 10.1371/journal.pone.0205855. eCollection 2018.
10
Utility of social media and crowd-intelligence data for pharmacovigilance: a scoping review.社交媒体和众包数据在药物警戒中的应用:范围综述。
BMC Med Inform Decis Mak. 2018 Jun 14;18(1):38. doi: 10.1186/s12911-018-0621-y.

本文引用的文献

1
Automated identification of drug and food allergies entered using non-standard terminology.自动识别使用非标准术语输入的药物和食物过敏。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):962-8. doi: 10.1136/amiajnl-2013-001756. Epub 2013 Jun 7.
2
Identifying medical terms in patient-authored text: a crowdsourcing-based approach.识别患者撰写文本中的医学术语:基于众包的方法。
J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1120-7. doi: 10.1136/amiajnl-2012-001110. Epub 2013 May 5.
3
Web-scale pharmacovigilance: listening to signals from the crowd.网络规模药物警戒:从人群中聆听信号。
J Am Med Inform Assoc. 2013 May 1;20(3):404-8. doi: 10.1136/amiajnl-2012-001482. Epub 2013 Mar 6.
4
When Google got flu wrong.当谷歌在流感预测上出错时。
Nature. 2013 Feb 14;494(7436):155-6. doi: 10.1038/494155a.
5
Using rule-based natural language processing to improve disease normalization in biomedical text.基于规则的自然语言处理在生物医学文本疾病标准化中的应用。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):876-81. doi: 10.1136/amiajnl-2012-001173. Epub 2012 Oct 6.
6
A novel signal detection algorithm for identifying hidden drug-drug interactions in adverse event reports.一种用于识别不良事件报告中隐藏药物-药物相互作用的新型信号检测算法。
J Am Med Inform Assoc. 2012 Jan-Feb;19(1):79-85. doi: 10.1136/amiajnl-2011-000214. Epub 2011 Jun 14.
7
Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm.利用在线自我报告的患者数据和患者匹配算法加速临床发现。
Nat Biotechnol. 2011 May;29(5):411-4. doi: 10.1038/nbt.1837. Epub 2011 Apr 24.
8
The open biomedical annotator.开放式生物医学注释工具
Summit Transl Bioinform. 2009 Mar 1;2009:56-60.
9
An overview of MetaMap: historical perspective and recent advances.MetaMap 概述:历史视角与最新进展。
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.
10
Google trends: a web-based tool for real-time surveillance of disease outbreaks.谷歌趋势:一种基于网络的疾病暴发实时监测工具。
Clin Infect Dis. 2009 Nov 15;49(10):1557-64. doi: 10.1086/630200.