一项比较从临床自由文本中提取医学主题词的词汇法和统计法的实验。

An experiment comparing lexical and statistical methods for extracting MeSH terms from clinical free text.

作者信息

Cooper G F, Miller R A

机构信息

Center for Biomedical Informatics, University of Pittsburgh, PA 15213-2582, USA.

出版信息

J Am Med Inform Assoc. 1998 Jan-Feb;5(1):62-75. doi: 10.1136/jamia.1998.0050062.

DOI:10.1136/jamia.1998.0050062

PMID:9452986

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC61276/

Abstract

OBJECTIVE

A primary goal of the University of Pittsburgh's 1990-94 UMLS-sponsored effort was to develop and evaluate PostDoc (a lexical indexing system) and Pindex (a statistical indexing system) comparatively, and then in combination as a hybrid system. Each system takes as input a portion of the free text from a narrative part of a patient's electronic medical record and returns a list of suggested MeSH terms to use in formulating a Medline search that includes concepts in the text. This paper describes the systems and reports an evaluation. The intent is for this evaluation to serve as a step toward the eventual realization of systems that assist healthcare personnel in using the electronic medical record to construct patient-specific searches of Medline.

DESIGN

The authors tested the performances of PostDoc, Pindex, and a hybrid system, using text taken from randomly selected clinical records, which were stratified to include six radiology reports, six pathology reports, and six discharge summaries. They identified concepts in the clinical records that might conceivably be used in performing a patient-specific Medline search. Each system was given the free text of each record as an input. The extent to which a system-derived list of MeSH terms captured the relevant concepts in these documents was determined based on blinded assessments by the authors.

RESULTS

PostDoc output a mean of approximately 19 MeSH terms per report, which included about 40% of the relevant report concepts. Pindex output a mean of approximately 57 terms per report and captured about 45% of the relevant report concepts. A hybrid system captured approximately 66% of the relevant concepts and output about 71 terms per report.

CONCLUSION

The outputs of PostDoc and Pindex are complementary in capturing MeSH terms from clinical free text. The results suggest possible approaches to reduce the number of terms output while maintaining the percentage of terms captured, including the use of UMLS semantic types to constrain the output list to contain only clinically relevant MeSH terms.

摘要

目的

匹兹堡大学在1990 - 1994年由统一医学语言系统（UMLS）资助的工作的一个主要目标是对PostDoc（一种词汇索引系统）和Pindex（一种统计索引系统）进行比较开发和评估，然后将它们组合成一个混合系统。每个系统将患者电子病历叙述部分的一部分自由文本作为输入，并返回一份建议的医学主题词（MeSH）列表，用于制定包含文本中概念的医学文献数据库（Medline）搜索。本文描述了这些系统并报告了一项评估。该评估的目的是作为迈向最终实现协助医护人员利用电子病历构建针对特定患者的Medline搜索系统的一步。

设计

作者使用从随机选择的临床记录中提取的文本测试了PostDoc、Pindex和一个混合系统的性能，这些临床记录被分层以包括六份放射学报告、六份病理学报告和六份出院小结。他们确定了临床记录中可能用于执行针对特定患者的Medline搜索的概念。每个系统都将每份记录的自由文本作为输入。基于作者的盲法评估，确定系统生成的MeSH词列表捕获这些文档中相关概念的程度。

结果

PostDoc每份报告平均输出约19个MeSH词，其中包括约40%的相关报告概念。Pindex每份报告平均输出约57个词，并捕获了约45%的相关报告概念。一个混合系统捕获了约66%的相关概念，每份报告输出约71个词。

结论

PostDoc和Pindex的输出在从临床自由文本中捕获MeSH词方面是互补的。结果表明了在保持捕获词的百分比的同时减少输出词数量的可能方法，包括使用UMLS语义类型来限制输出列表仅包含临床相关的MeSH词。

相似文献

An experiment comparing lexical and statistical methods for extracting MeSH terms from clinical free text.一项比较从临床自由文本中提取医学主题词的词汇法和统计法的实验。

J Am Med Inform Assoc. 1998 Jan-Feb;5(1):62-75. doi: 10.1136/jamia.1998.0050062.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

MeSH indexing based on automatically generated summaries.基于自动生成的摘要进行 MeSH 标引。

BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.

Cross-language MeSH indexing using morpho-semantic normalization.使用形态语义归一化的跨语言医学主题词表索引编制

AMIA Annu Symp Proc. 2003;2003:425-9.

MEDRank: using graph-based concept ranking to index biomedical texts.MEDRank：基于图的概念排序在生物医学文本索引中的应用。

Int J Med Inform. 2011 Jun;80(6):431-41. doi: 10.1016/j.ijmedinf.2011.02.008. Epub 2011 Mar 25.

Comparison and combination of several MeSH indexing approaches.几种医学主题词（MeSH）标引方法的比较与组合

AMIA Annu Symp Proc. 2013 Nov 16;2013:709-18. eCollection 2013.

A bottom-up approach to MEDLINE indexing recommendations.一种自下而上的医学文献数据库（MEDLINE）索引推荐方法。

AMIA Annu Symp Proc. 2011;2011:1583-92. Epub 2011 Oct 22.

Fine-grained indexing of the biomedical literature: MeSH subheading attachment for a MEDLINE indexing tool.生物医学文献的细粒度索引：用于MEDLINE索引工具的医学主题词副主题词附加

AMIA Annu Symp Proc. 2007 Oct 11;2007:553-7.

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation.利用 MEDLINE 中的 MeSH 索引生成用于词义消歧的数据集合。

BMC Bioinformatics. 2011 Jun 2;12:223. doi: 10.1186/1471-2105-12-223.

Automated semantic indexing of imaging reports to support retrieval of medical images in the multimedia electronic medical record.影像报告的自动语义索引，以支持在多媒体电子病历中检索医学图像。

Methods Inf Med. 1999 Dec;38(4-5):303-7.

引用本文的文献

Evaluation of the informatician perspective: determining types of research papers preferred by clinicians.评估信息学视角：确定临床医生偏爱的研究论文类型。

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):74. doi: 10.1186/s12911-017-0463-z.

Terminology extraction from medical texts in Polish.从波兰语医学文本中提取术语。

J Biomed Semantics. 2014 May 31;5:24. doi: 10.1186/2041-1480-5-24. eCollection 2014.

Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts.用于MEDLINE/PubMed摘要高效自动索引的确定性二元向量

AMIA Annu Symp Proc. 2012;2012:940-9. Epub 2012 Nov 3.

Using Medical Text Extraction, Reasoning and Mapping System (MTERMS) to process medication information in outpatient clinical notes.使用医学文本提取、推理与映射系统（MTERMS）处理门诊临床记录中的用药信息。

AMIA Annu Symp Proc. 2011;2011:1639-48. Epub 2011 Oct 22.

Using noun phrases for navigating biomedical literature on Pubmed: how many updates are we losing track of?使用名词短语在 PubMed 上浏览生物医学文献：我们遗漏了多少更新？

PLoS One. 2011;6(9):e24920. doi: 10.1371/journal.pone.0024920. Epub 2011 Sep 14.

Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation.基于期刊描述符的文档自动索引：初步调查

J Am Soc Inf Sci. 1999;50(8):661-674. doi: 10.1002/(SICI)1097-4571(1999)50:8<661::AID-ASI4>3.0.CO;2-R.

Data from clinical notes: a perspective on the tension between structure and flexible documentation.临床笔记数据：结构与灵活记录之间的紧张关系之观点。

J Am Med Inform Assoc. 2011 Mar-Apr;18(2):181-6. doi: 10.1136/jamia.2010.007237. Epub 2011 Jan 12.

Improving textual medication extraction using combined conditional random fields and rule-based systems.利用联合条件随机场和基于规则的系统提高文本药物提取的效率。

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):540-4. doi: 10.1136/jamia.2010.004119.

Empirical distributional semantics: methods and biomedical applications.实证分布语义学：方法与生物医学应用

J Biomed Inform. 2009 Apr;42(2):390-405. doi: 10.1016/j.jbi.2009.02.002. Epub 2009 Feb 14.

eQuality for all: Extending automated quality measurement of free text clinical narratives.全民平等：扩展自由文本临床叙述的自动化质量测量

AMIA Annu Symp Proc. 2008 Nov 6;2008:71-5.

本文引用的文献

On the heuristic nature of medical decision-support systems.论医学决策支持系统的启发式本质。

Methods Inf Med. 1995 Mar;34(1-2):5-14.

Automating concept identification in the electronic medical record: an experiment in extracting dosage information.电子病历中概念识别的自动化：提取剂量信息的一项实验。

Proc AMIA Annu Fall Symp. 1996:388-92.

The Unified Medical Language System.统一医学语言系统

Methods Inf Med. 1993 Aug;32(4):281-91. doi: 10.1055/s-0038-1634945.

A natural language understanding system combining syntactic and semantic techniques.一个结合了句法和语义技术的自然语言理解系统。

Proc Annu Symp Comput Appl Med Care. 1994:247-51.

Lexical methods for managing variation in biomedical terminologies.用于管理生物医学术语变异的词汇方法。

Proc Annu Symp Comput Appl Med Care. 1994:235-9.

Using POSTDOC to recognize biomedical concepts in medical school curricular documents.使用POSTDOC识别医学院课程文档中的生物医学概念。

Bull Med Libr Assoc. 1994 Jul;82(3):283-7.

A general natural-language text processor for clinical radiology.一种用于临床放射学的通用自然语言文本处理器。

J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74. doi: 10.1136/jamia.1994.95236146.

Natural language processing and the representation of clinical data.自然语言处理与临床数据的表示

J Am Med Inform Assoc. 1994 Mar-Apr;1(2):142-60. doi: 10.1136/jamia.1994.95236145.

Building a medical multimedia database system to integrate clinical information: an application of high-performance computing and communications technology.构建医学多媒体数据库系统以整合临床信息：高性能计算与通信技术的一种应用。

Bull Med Libr Assoc. 1995 Jan;83(1):57-64.

Medical Subject Headings and medical terminology: an analysis of terminology used in hospital charts.医学主题词表与医学术语：对医院病历中使用的术语的分析

Bull Med Libr Assoc. 1987 Apr;75(2):89-94.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验