Suppr超能文献

从波兰语医学文本中提取术语。

Terminology extraction from medical texts in Polish.

作者信息

Marciniak Małgorzata, Mykowiecka Agnieszka

机构信息

Institute of Computer Science PAS, Jana Kazimierza 5, 01-248 Warsaw, Poland.

出版信息

J Biomed Semantics. 2014 May 31;5:24. doi: 10.1186/2041-1480-5-24. eCollection 2014.

Abstract

BACKGROUND

Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction.

RESULTS

Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH.

CONCLUSIONS

Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts.

摘要

背景

医院文档包含描述患者及其疾病相关最重要事实的自由文本。这些文档使用包含与医院治疗相关医学术语的特定语言编写。其自动处理有助于验证医院文档的一致性并获取统计数据。为执行此任务,我们需要有关我们正在寻找的短语的信息。目前,波兰语临床资源匮乏。现有的术语表,如波兰医学主题词表(MeSH),对临床任务的覆盖不足。因此,如果能够基于数据样本自动准备一组初始术语,经过人工验证后可用于信息提取,将很有帮助。

结果

通过结合语言和统计方法处理1200多条儿童医院出院记录,我们获得了一份用波兰语编写的医院出院文档中使用的单字和多字术语列表。这些短语根据其在领域文本中的假定重要性排序,该重要性通过短语的使用频率及其上下文的多样性来衡量。评估表明,自动识别的短语涵盖了领域文本中约84%的术语。在排名列表的顶部,400个术语中只有4%不正确,而在最终的200个术语中,20%的表达要么与领域无关,要么语法不正确。我们还观察到,所获得的术语中有70%未包含在波兰语MeSH中。

结论

自动术语提取可以给出质量足够高的结果,可作为构建与领域相关的术语词典或本体的起点。这种方法对于为尚无相关术语的非常特定的子领域准备术语资源可能很有用。所进行的评估表明,没有一种测试的排名程序能够从列表顶部过滤掉所有结构不当的名词短语。仔细选择名词短语对于创建的术语资源在诸如词典构建或从文本中获取语义关系等应用中的有用性至关重要。

相似文献

1
Terminology extraction from medical texts in Polish.
J Biomed Semantics. 2014 May 31;5:24. doi: 10.1186/2041-1480-5-24. eCollection 2014.
3
Information content in Medline record fields.
Int J Med Inform. 2004 Jun 30;73(6):515-27. doi: 10.1016/j.ijmedinf.2004.02.008.
4
Identifying important concepts from medical documents.
J Biomed Inform. 2006 Dec;39(6):668-79. doi: 10.1016/j.jbi.2006.02.001. Epub 2006 Mar 2.
6
Terminology spectrum analysis of natural-language chemical documents: term-like phrases retrieval routine.
J Cheminform. 2016 Apr 29;8:22. doi: 10.1186/s13321-016-0136-4. eCollection 2016.
7
Disambiguation of biomedical text using diverse sources of information.
BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S7. doi: 10.1186/1471-2105-9-S11-S7.
8
Deriving a probabilistic syntacto-semantic grammar for biomedicine based on domain-specific terminologies.
J Biomed Inform. 2011 Oct;44(5):805-14. doi: 10.1016/j.jbi.2011.04.006. Epub 2011 Apr 28.
9
A two-stage deep learning approach for extracting entities and relationships from medical texts.
J Biomed Inform. 2019 Nov;99:103285. doi: 10.1016/j.jbi.2019.103285. Epub 2019 Sep 20.
10

引用本文的文献

1
Collecting specialty-related medical terms: Development and evaluation of a resource for Spanish.
BMC Med Inform Decis Mak. 2021 May 4;21(1):145. doi: 10.1186/s12911-021-01495-w.
2
Fine-grained information extraction from German transthoracic echocardiography reports.
BMC Med Inform Decis Mak. 2015 Nov 12;15:91. doi: 10.1186/s12911-015-0215-x.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验