McKinney Alexander M, Moore Jessica A, Campbell Kevin, Braga Thiago A, Rykken Jeffrey B, Jagadeesan Bharathi D, McKinney Zeke J
Department of Radiology, University of Miami-Miller School of Medicine, Miami, FL, USA.
University of Minnesota, St. Paul, Minnesota, USA.
Heliyon. 2024 May 7;10(10):e30106. doi: 10.1016/j.heliyon.2024.e30106. eCollection 2024 May 30.
Natural language processing (NLP) can generate diagnoses codes from imaging reports. Meanwhile, the International Classification of Diseases (ICD-10) codes are the United States' standard for billing/coding, which enable tracking disease burden and outcomes. This cross-sectional study aimed to test feasibility of an NLP algorithm's performance and comparison to radiologists' and physicians' manual coding.
Three neuroradiologists and one non-radiologist physician reviewers manually coded a randomly-selected pool of 200 craniospinal CT and MRI reports from a pool of >10,000. The NLP algorithm () subdivided each report's Impression into "phrases", with multiple ICD-10 matches for each phrase. Only viewing the Impression, the physician reviewers selected the single best ICD-10 code for each phrase. Codes selected by the physicians and algorithm were compared for agreement.
The algorithm extracted the reports' Impressions into 645 phrases, each having ranked ICD-10 matches. Regarding the reviewers' selected codes, pairwise agreement was unreliable ( = ). Using unanimous reviewer agreement as "ground truth", the algorithm's sensitivity/specificity/F2 for top 5 codes was , and for the single best code was The engine tabulated "pertinent negatives" as negative codes for stated findings (e.g. "no intracranial hemorrhage"). The engine's matching was more specific for shorter than full-length ICD-10 codes ( = ).
Manual coding by physician reviewers has significant variability and is time-consuming, while the NLP algorithm's top 5 diagnosis codes are relatively accurate. This preliminary work demonstrates the feasibility and potential for generating codes with reliability and consistency. Future works may include correlating diagnosis codes with clinical encounter codes to evaluate imaging's impact on, and relevance to care.
自然语言处理(NLP)可从影像报告中生成诊断代码。同时,国际疾病分类(ICD - 10)代码是美国计费/编码的标准,可用于跟踪疾病负担和结果。这项横断面研究旨在测试一种NLP算法的性能及其与放射科医生和内科医生手动编码相比较的可行性。
三名神经放射科医生和一名非放射科医生审阅者对从超过10000份报告中随机抽取的200份颅脊髓CT和MRI报告进行手动编码。NLP算法将每份报告的印象部分细分为“短语”,每个短语有多个ICD - 10匹配项。仅查看印象部分,医生审阅者为每个短语选择单个最佳ICD - 10代码。比较医生和算法选择的代码以评估一致性。
该算法将报告的印象部分提取为645个短语,每个短语都有排名的ICD - 10匹配项。关于审阅者选择的代码,两两一致性不可靠(=)。以审阅者的一致意见作为“金标准”,算法前5个代码的灵敏度/特异性/F2为 ,单个最佳代码的为 该引擎将“相关阴性结果”列为所述发现的阴性代码(例如“无颅内出血”)。该引擎的匹配对于较短的ICD - 10代码比全长代码更具特异性(=)。
医生审阅者的手动编码存在显著变异性且耗时,而NLP算法的前5个诊断代码相对准确。这项初步工作证明了可靠且一致地生成代码的可行性和潜力。未来的工作可能包括将诊断代码与临床会诊代码相关联,以评估影像对医疗的影响及相关性。