Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY.
School of Computing, Queen's University, Kingston, Ontario, Canada.
JCO Clin Cancer Inform. 2022 Jan;6:e2100104. doi: 10.1200/CCI.21.00104.
To assess the accuracy of a natural language processing (NLP) model in extracting splenomegaly described in patients with cancer in structured computed tomography radiology reports.
In this retrospective study between July 2009 and April 2019, 3,87,359 consecutive structured radiology reports for computed tomography scans of the chest, abdomen, and pelvis from 91,665 patients spanning 30 types of cancer were included. A randomized sample of 2,022 reports from patients with colorectal cancer, hepatobiliary cancer (HB), leukemia, Hodgkin lymphoma (HL), and non-HL patients was manually annotated as positive or negative for splenomegaly. NLP model training/testing was performed on 1,617/405 reports, and a new validation set of 400 reports from all cancer subtypes was used to test NLP model accuracy, precision, and recall. Overall survival was compared between the patient groups (with and without splenomegaly) using Kaplan-Meier curves.
The final cohort included 3,87,359 reports from 91,665 patients (mean age 60.8 years; 51.2% women). In the testing set, the model achieved accuracy of 92.1%, precision of 92.2%, and recall of 92.1% for splenomegaly. In the validation set, accuracy, precision, and recall were 93.8%, 92.9%, and 86.7%, respectively. In the entire cohort, splenomegaly was most frequent in patients with leukemia (32.5%), HB (17.4%), non-HL (9.1%), colorectal cancer (8.5%), and HL (5.6%). A splenomegaly label was associated with an increased risk of mortality in the entire cohort (hazard ratio 2.10; 95% CI, 1.98 to 2.22; < .001).
Automated splenomegaly labeling by NLP of radiology report demonstrates good accuracy, precision, and recall. Splenomegaly is most frequently reported in patients with leukemia, followed by patients with HB.
评估自然语言处理(NLP)模型在提取癌症患者计算机断层扫描放射学报告中描述的脾肿大的准确性。
在这项回顾性研究中,纳入了 2009 年 7 月至 2019 年 4 月期间的 91665 名患者的 387359 份连续的胸部、腹部和骨盆计算机断层扫描的结构化放射学报告,涵盖了 30 种癌症。从结直肠癌、肝胆癌(HB)、白血病、霍奇金淋巴瘤(HL)和非 HL 患者中随机抽取 2022 份报告,对脾肿大进行阳性或阴性的手动标注。对 1617/405 份报告进行 NLP 模型的训练/测试,并使用来自所有癌症亚型的 400 份新的验证集来测试 NLP 模型的准确性、精度和召回率。使用 Kaplan-Meier 曲线比较有/无脾肿大的患者组之间的总生存率。
最终队列包括来自 91665 名患者的 387359 份报告(平均年龄 60.8 岁;51.2%为女性)。在测试集中,该模型在脾肿大的检测中实现了 92.1%的准确性、92.2%的精度和 92.1%的召回率。在验证集中,准确性、精度和召回率分别为 93.8%、92.9%和 86.7%。在整个队列中,脾肿大最常见于白血病患者(32.5%)、HB 患者(17.4%)、非 HL 患者(9.1%)、结直肠癌患者(8.5%)和 HL 患者(5.6%)。在整个队列中,脾肿大的标签与死亡率的增加相关(危险比 2.10;95%置信区间,1.98 至 2.22;<0.001)。
通过放射学报告的 NLP 自动进行脾肿大标记具有良好的准确性、精度和召回率。脾肿大最常发生在白血病患者中,其次是 HB 患者。