Department of Computer Science and Systems Technology, University of Pannonia, Veszprém, Hungary.
1st Department of Cardiology, State Hospital for Cardiology, Balatonfüred, Hungary.
Artif Intell Med. 2023 Sep;143:102584. doi: 10.1016/j.artmed.2023.102584. Epub 2023 May 20.
In everyday medical practice, the results of cardiac ultrasound examinations are generally recorded in unstructured text, from which extracting relevant information is an important and challenging task. This paper presents a generally applicable language and corpus-independent text mining method for extracting and structuring numerical measurement results and their descriptions from echocardiography reports.
The developed method is based on generally applicable text mining preprocessing activities, it automatically identifies and standardizes the descriptions of the cardiac ultrasound measures, and it stores the extracted and standardized measurement descriptions with their measurement results in a structured form for later usage. The method does not contain any regular expression-based search and does not rely on information about the structure of the document.
The method has been tested on a document set containing more than 20,000 echocardiographic reports by examining the efficiency of extracting 12 echocardiography parameters considered important by experts. The method extracted and structured the echocardiography parameters under the study with good sensitivity (lowest value: 0.775, highest value: 1.0, average: 0.904) and excellent specificity (for all cases 1.0). The F1 score ranged between 0.873 and 1.0, and its average value was 0.948.
The presented case study has shown that the proposed method can extract measurement results from echocardiography documents with high confidence without performing a direct search or having detailed information about the data recording habits. Furthermore, it effectively handles spelling errors, abbreviations and the highly varied terminology used in descriptions. As it does not rely on any information related to the structure or the language of the documents or data recording habits, it can be applied for processing any free-text written medical texts.
在日常医疗实践中,心脏超声检查的结果通常以非结构化文本的形式记录,从这些文本中提取相关信息是一项重要且具有挑战性的任务。本文提出了一种普遍适用的语言和语料库独立的文本挖掘方法,用于从超声心动图报告中提取和构建数值测量结果及其描述。
所开发的方法基于普遍适用的文本挖掘预处理活动,它自动识别和标准化心脏超声测量的描述,并以结构化的形式存储提取和标准化的测量描述及其测量结果,以便以后使用。该方法不包含基于正则表达式的搜索,也不依赖于文档结构的信息。
该方法已通过检查专家认为重要的 12 个超声心动图参数的提取效率,在包含超过 20000 份超声心动图报告的文档集上进行了测试。该方法以良好的灵敏度(最低值:0.775,最高值:1.0,平均值:0.904)和出色的特异性(所有情况下均为 1.0)提取和结构化了研究中的超声心动图参数。F1 分数在 0.873 到 1.0 之间,平均值为 0.948。
所提出的案例研究表明,该方法可以在不进行直接搜索或不具有有关数据记录习惯的详细信息的情况下,从超声心动图文档中提取测量结果,具有高度的置信度。此外,它有效地处理拼写错误、缩写和描述中使用的高度变化的术语。由于它不依赖于与文档结构或数据记录习惯相关的任何信息,因此可以应用于处理任何自由文本书写的医学文本。