用于改进消费者医疗搜索的方言主题建模

Dialect topic modeling for improved consumer medical search.

作者信息

Crain Steven P, Yang Shuang-Hong, Zha Hongyuan, Jiao Yu

机构信息

Georgia Institute of Technology, Atlanta, GA.

出版信息

AMIA Annu Symp Proc. 2010 Nov 13;2010:132-6.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3041409/

Abstract

Access to health information by consumers is hampered by a fundamental language gap. Current attempts to close the gap leverage consumer oriented health information, which does not, however, have good coverage of slang medical terminology. In this paper, we present a Bayesian model to automatically align documents with different dialects (slang, common and technical) while extracting their semantic topics. The proposed diaTM model enables effective information retrieval, even when the query contains slang words, by explicitly modeling the mixtures of dialects in documents and the joint influence of dialects and topics on word selection. Simulations using consumer questions to retrieve medical information from a corpus of medical documents show that diaTM achieves a 25% improvement in information retrieval relevance by nDCG@5 over an LDA baseline.

摘要

消费者获取健康信息受到基本语言差距的阻碍。当前缩小这一差距的尝试利用了面向消费者的健康信息，然而，这类信息对医学俚语术语的覆盖并不理想。在本文中，我们提出了一种贝叶斯模型，用于在提取不同方言（俚语、通用语和专业语）文档的语义主题时，自动将它们对齐。所提出的方言主题模型（diaTM）通过明确对文档中的方言混合以及方言和主题对单词选择的联合影响进行建模，即使查询中包含俚语单词，也能实现有效的信息检索。使用消费者问题从医学文档语料库中检索医学信息的模拟表明，与潜在狄利克雷分配（LDA）基线相比，diaTM在信息检索相关性方面，通过归一化折损累计增益（nDCG）@5指标实现了25%的提升。

相似文献

1

Dialect topic modeling for improved consumer medical search.用于改进消费者医疗搜索的方言主题建模

AMIA Annu Symp Proc. 2010 Nov 13;2010:132-6.

2

Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span.生物医学语料库的统计建模：挖掘秀丽隐杆线虫遗传中心文献中与寿命相关的基因

BMC Bioinformatics. 2006 May 8;7:250. doi: 10.1186/1471-2105-7-250.

3

Framing Electronic Medical Records as Polylingual Documents in Query Expansion.在查询扩展中将电子病历构建为多语言文档

AMIA Annu Symp Proc. 2018 Apr 16;2017:940-949. eCollection 2017.

4

Querying EHRs with a Semantic and Entity-Oriented Query Language.使用语义和面向实体的查询语言查询电子健康记录。

Stud Health Technol Inform. 2017;235:121-125.

5

Use of consumer health vocabularies in online physician directory to improve physician search.在在线医生名录中使用消费者健康词汇以改善医生搜索。

AMIA Annu Symp Proc. 2008 Nov 6:974.

6

iSMART: Ontology-based Semantic query of CDA documents.iSMART：基于本体的CDA文档语义查询。

AMIA Annu Symp Proc. 2009 Nov 14;2009:375-9.

7

Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying.通过openEHR原型构建传统病理报告以实现语义查询。

Methods Inf Med. 2017 May 18;56(3):230-237. doi: 10.3414/ME16-01-0073. Epub 2017 Feb 28.

8

Evaluation of the Terminology Coverage in the French Corpus LiSSa.法语语料库LiSSa中术语覆盖范围的评估

Stud Health Technol Inform. 2017;235:126-130.

9

Consumer health information and question answering: helping consumers find answers to their health-related information needs.消费者健康信息与问答：帮助消费者寻找与其健康相关的信息需求的答案。

J Am Med Inform Assoc. 2020 Feb 1;27(2):194-201. doi: 10.1093/jamia/ocz152.

10

Semantic annotation for concept-based cross-language medical information retrieval.基于概念的跨语言医学信息检索的语义标注。

Int J Med Inform. 2002 Dec 4;67(1-3):97-112. doi: 10.1016/s1386-5056(02)00058-8.

引用本文的文献

1

Methodologically grounded semantic analysis of large volume of chilean medical literature data applied to the analysis of medical research funding efficiency in Chile.基于方法学的智利大量医学文献数据语义分析应用于智利医学研究经费效率分析。

J Biomed Semantics. 2020 Sep 29;11(1):12. doi: 10.1186/s13326-020-00226-w.

2

Using phrases and document metadata to improve topic modeling of clinical reports.使用短语和文档元数据改进临床报告的主题建模。

J Biomed Inform. 2016 Jun;61:260-6. doi: 10.1016/j.jbi.2016.04.005. Epub 2016 Apr 21.

3

Redundancy-aware topic modeling for patient record notes.用于病历记录的冗余感知主题建模

PLoS One. 2014 Feb 13;9(2):e87555. doi: 10.1371/journal.pone.0087555. eCollection 2014.

4

Designing and evaluating a clustering system for organizing and integrating patient drug outcomes in personal health messages.设计并评估一个用于在个人健康信息中组织和整合患者药物治疗结果的聚类系统。

AMIA Annu Symp Proc. 2012;2012:417-26. Epub 2012 Nov 3.

5

Query log analysis of an electronic health record search engine.电子健康记录搜索引擎的查询日志分析

AMIA Annu Symp Proc. 2011;2011:915-24. Epub 2011 Oct 22.

本文引用的文献

1

A taxonomy characterizing complexity of consumer eHealth Literacy.一种描述消费者电子健康素养复杂性的分类法。

AMIA Annu Symp Proc. 2009 Nov 14;2009:86-90.

2

Assessing consumer health vocabulary familiarity: an exploratory study.评估消费者对健康词汇的熟悉程度：一项探索性研究。

J Med Internet Res. 2007 Mar 14;9(1):e5. doi: 10.2196/jmir.9.1.e5.

3

MedicoPort: a medical search engine for all.医学端口：面向所有人的医学搜索引擎。

Comput Methods Programs Biomed. 2007 Apr;86(1):73-86. doi: 10.1016/j.cmpb.2007.01.007. Epub 2007 Feb 22.

4

Assisting consumer health information retrieval with query recommendations.通过查询推荐辅助消费者健康信息检索。

J Am Med Inform Assoc. 2006 Jan-Feb;13(1):80-90. doi: 10.1197/jamia.M1820. Epub 2005 Oct 12.

5

Characteristics of consumer terminology for health information retrieval.健康信息检索的消费者术语特征。

Methods Inf Med. 2002;41(4):289-98.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验