Puts Sander, Nobel Martijn, Zegers Catharina, Bermejo Iñigo, Robben Simon, Dekker Andre
GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, Netherlands.
Department of Radiation Oncology, Maastro, Maastricht, Netherlands.
JMIR Form Res. 2023 Mar 22;7:e38125. doi: 10.2196/38125.
Natural language processing (NLP) is thought to be a promising solution to extract and store concepts from free text in a structured manner for data mining purposes. This is also true for radiology reports, which still consist mostly of free text. Accurate and complete reports are very important for clinical decision support, for instance, in oncological staging. As such, NLP can be a tool to structure the content of the radiology report, thereby increasing the report's value.
This study describes the implementation and validation of an N-stage classifier for pulmonary oncology. It is based on free-text radiological chest computed tomography reports according to the tumor, node, and metastasis (TNM) classification, which has been added to the already existing T-stage classifier to create a combined TN-stage classifier.
SpaCy, PyContextNLP, and regular expressions were used for proper information extraction, after additional rules were set to accurately extract N-stage.
The overall TN-stage classifier accuracy scores were 0.84 and 0.85, respectively, for the training (N=95) and validation (N=97) sets. This is comparable to the outcomes of the T-stage classifier (0.87-0.92).
This study shows that NLP has potential in classifying pulmonary oncology from free-text radiological reports according to the TNM classification system as both the T- and N-stages can be extracted with high accuracy.
自然语言处理(NLP)被认为是一种很有前景的解决方案,可用于以结构化方式从自由文本中提取和存储概念,以用于数据挖掘目的。放射学报告也是如此,其大部分内容仍然是自由文本。准确完整的报告对于临床决策支持非常重要,例如在肿瘤分期方面。因此,NLP可以作为一种工具来构建放射学报告的内容,从而提高报告的价值。
本研究描述了一种用于肺肿瘤学的N分期分类器的实施和验证。它基于根据肿瘤、淋巴结和转移(TNM)分类的自由文本胸部计算机断层扫描报告,并已添加到现有的T分期分类器中,以创建一个组合的TN分期分类器。
在设置了额外规则以准确提取N分期后,使用SpaCy、PyContextNLP和正则表达式进行适当的信息提取。
训练集(N = 95)和验证集(N = 97)的总体TN分期分类器准确率分别为0.84和0.85。这与T分期分类器的结果(0.87 - 0.92)相当。
本研究表明,NLP在根据TNM分类系统从自由文本放射学报告中对肺肿瘤学进行分类方面具有潜力,因为T分期和N分期都可以高精度提取。