Wu Pei-Hao, Yu Avon, Tsai Ching-Wei, Koh Jia-Ling, Kuo Chin-Chi, Chen Arbee L P
1Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei, Taiwan.
2Big Data Center and Nephrology Division, China Medical University Hospital and College of Medicine, China Medical University, Taichung, Taiwan.
Health Inf Sci Syst. 2020 Apr 3;8(1):18. doi: 10.1007/s13755-020-00108-6. eCollection 2020 Dec.
In recent years, patients usually accept more accurate and detailed examinations because of the rapid advances in medical technology. Many of the examination reports are not represented in numerical data, but text documents written by the medical examiners based on the observations from the instruments and biochemical tests. If the above-mentioned unstructured data can be organized as a report in a structured form, it will help doctors to understand a patient's status of the various examinations more efficiently. Besides, further association analysis on the structuralized data can be performed to identify potential factors that affect a disease.
In this paper, from the pathology examination reports of renal diseases, we applied the POS tagging results of natural language analysis to automatically extract the keyword phrases. Then a medical dictionary for various examination items in an examination report is established, which is used as the basic information for retrieving the terms to construct a structured form of the report. Moreover, a topical probability modeling method is applied to automatically discover the candidate keyword phrases of the examination items from the reports. Finally, a system is implemented to generate the structured form for the various examination items in a report according to the constructed medical dictionary.
The results of the experiments showed that the methods proposed in this paper can effectively construct a structural form of examination reports. Furthermore, the keywords of the popular examination items can be extracted correctly. The above techniques will help automatic processing and analysis of medical text reports.
近年来,由于医学技术的飞速发展,患者通常会接受更准确、更详细的检查。许多检查报告并非以数值数据呈现,而是医学检查人员根据仪器观察和生化检测结果撰写的文本文件。如果上述非结构化数据能够以结构化形式整理成报告,将有助于医生更高效地了解患者各项检查的状况。此外,还可以对结构化数据进行进一步的关联分析,以识别影响疾病的潜在因素。
在本文中,我们从肾脏疾病的病理检查报告中,应用自然语言分析的词性标注结果自动提取关键词组。然后建立一份检查报告中各类检查项目的医学词典,将其作为检索术语的基本信息,以构建报告的结构化形式。此外,应用主题概率建模方法从报告中自动发现检查项目的候选关键词组。最后,实现一个系统,根据构建的医学词典生成报告中各类检查项目的结构化形式。
实验结果表明,本文提出的方法能够有效地构建检查报告的结构化形式。此外,能够正确提取常见检查项目的关键词。上述技术将有助于医学文本报告的自动处理与分析。