Department of Medicine and Surgery, Pathology, IRCCS Fondazione San Gerardo dei Tintori, University of Milano-Bicocca, Italy.
Section of Pathology, Department of Medical and Surgical Sciences for Children and Adults, University of Modena and Reggio Emilia, University Hospital of Modena, Modena, Italy.
Pathologica. 2023 Dec;115(6):318-324. doi: 10.32074/1591-951X-952.
The use of standardized structured reports (SSR) and suitable terminologies like SNOMED-CT can enhance data retrieval and analysis, fostering large-scale studies and collaboration. However, the still large prevalence of narrative reports in our laboratories warrants alternative and automated labeling approaches. In this project, natural language processing (NLP) methods were used to associate SNOMED-CT codes to structured and unstructured reports from an Italian Digital Pathology Department.
Two NLP-based automatic coding systems (support vector machine, SVM, and long-short term memory, LSTM) were trained and applied to a series of narrative reports.
The 1163 cases were tested with both algorithms, showing good performances in terms of accuracy, precision, recall, and F1 score, with SVM showing slightly better performances as compared to LSTM (0.84, 0.87, 0.83, 0.82 vs 0.83, 0.85, 0.83, 0.82, respectively). The integration of an explainability allowed identification of terms and groups of words of importance, enabling fine-tuning, balancing semantic meaning and model performance.
AI tools allow the automatic SNOMED-CT labeling of the pathology archives, providing a retrospective fix to the large lack of organization of narrative reports.
使用标准化结构化报告(SSR)和合适的术语,如 SNOMED-CT,可以增强数据检索和分析,促进大规模研究和合作。然而,我们实验室中仍然大量存在叙述性报告,这需要替代和自动化的标记方法。在这个项目中,自然语言处理(NLP)方法被用于将 SNOMED-CT 代码与来自意大利数字病理学部门的结构化和非结构化报告相关联。
两种基于 NLP 的自动编码系统(支持向量机,SVM 和长短期记忆,LSTM)被训练并应用于一系列叙述性报告。
这两个算法对 1163 个病例进行了测试,在准确性、精度、召回率和 F1 分数方面表现良好,SVM 的性能略优于 LSTM(分别为 0.84、0.87、0.83、0.82 和 0.83、0.85、0.83、0.82)。可解释性的集成允许识别重要的术语和词组,从而实现微调,平衡语义含义和模型性能。
人工智能工具允许对病理学档案进行自动 SNOMED-CT 标记,为大量缺乏组织的叙述性报告提供了回溯性修复。