Lee Kyeryoung, Liu Zongzhi, Chandran Urmila, Kalsekar Iftekhar, Laxmanan Balaji, Higashi Mitchell K, Jun Tomi, Ma Meng, Li Minghao, Mai Yun, Gilman Christopher, Wang Tongyu, Ai Lei, Aggarwal Parag, Pan Qi, Oh William, Stolovitzky Gustavo, Schadt Eric, Wang Xiaoyan
Sema4, Stamford, CT, United States.
Lung Cancer Initiative, Johnson & Johnson, New Brunswick, NJ, United States.
JMIR AI. 2023 Jun 1;2:e44537. doi: 10.2196/44537.
Ground-glass opacities (GGOs) appearing in computed tomography (CT) scans may indicate potential lung malignancy. Proper management of GGOs based on their features can prevent the development of lung cancer. Electronic health records are rich sources of information on GGO nodules and their granular features, but most of the valuable information is embedded in unstructured clinical notes.
We aimed to develop, test, and validate a deep learning-based natural language processing (NLP) tool that automatically extracts GGO features to inform the longitudinal trajectory of GGO status from large-scale radiology notes.
We developed a bidirectional long short-term memory with a conditional random field-based deep-learning NLP pipeline to extract GGO and granular features of GGO retrospectively from radiology notes of 13,216 lung cancer patients. We evaluated the pipeline with quality assessments and analyzed cohort characterization of the distribution of nodule features longitudinally to assess changes in size and solidity over time.
Our NLP pipeline built on the GGO ontology we developed achieved between 95% and 100% precision, 89% and 100% recall, and 92% and 100% F-scores on different GGO features. We deployed this GGO NLP model to extract and structure comprehensive characteristics of GGOs from 29,496 radiology notes of 4521 lung cancer patients. Longitudinal analysis revealed that size increased in 16.8% (240/1424) of patients, decreased in 14.6% (208/1424), and remained unchanged in 68.5% (976/1424) in their last note compared to the first note. Among 1127 patients who had longitudinal radiology notes of GGO status, 815 (72.3%) were reported to have stable status, and 259 (23%) had increased/progressed status in the subsequent notes.
Our deep learning-based NLP pipeline can automatically extract granular GGO features at scale from electronic health records when this information is documented in radiology notes and help inform the natural history of GGO. This will open the way for a new paradigm in lung cancer prevention and early detection.
计算机断层扫描(CT)中出现的磨玻璃影(GGO)可能提示潜在的肺恶性肿瘤。根据GGO的特征进行恰当管理可预防肺癌的发生。电子健康记录是有关GGO结节及其细微特征的丰富信息来源,但大多数有价值的信息都包含在非结构化的临床记录中。
我们旨在开发、测试并验证一种基于深度学习的自然语言处理(NLP)工具,该工具可从大规模放射学记录中自动提取GGO特征,以了解GGO状态的纵向轨迹。
我们开发了一种基于条件随机场的双向长短期记忆深度学习NLP管道,用于从13216例肺癌患者的放射学记录中回顾性提取GGO及其细微特征。我们通过质量评估对该管道进行评估,并纵向分析结节特征分布的队列特征,以评估大小和实性随时间的变化。
我们基于所开发的GGO本体构建的NLP管道,在不同GGO特征上的精确率在95%至100%之间,召回率在89%至100%之间,F值在92%至100%之间。我们部署此GGO NLP模型,从4521例肺癌患者的29496份放射学记录中提取并构建GGO的综合特征。纵向分析显示,与首次记录相比,在最后一次记录中,16.8%(240/1424)的患者结节大小增加,14.6%(208/1424)的患者结节大小减小,68.5%(976/1424)的患者结节大小保持不变。在1127例有GGO状态纵向放射学记录的患者中,815例(72.3%)报告状态稳定,259例(23%)在后续记录中状态增加/进展。
当电子健康记录中的放射学记录记录了这些信息时,我们基于深度学习的NLP管道可以大规模自动从电子健康记录中提取细微的GGO特征,并有助于了解GGO的自然史。这将为肺癌预防和早期检测的新范式开辟道路。