From the W.K. Kellogg Eye Center, Department of Ophthalmology and Visual Sciences, University of Michigan, Ann Arbor, Michigan, USA (J.D.S., Y.Z., C.A.A., J.B.); Department of Health Management and Policy, University of Michigan School of Public Health, Ann Arbor, Michigan, USA (J.D.S.).
From the W.K. Kellogg Eye Center, Department of Ophthalmology and Visual Sciences, University of Michigan, Ann Arbor, Michigan, USA (J.D.S., Y.Z., C.A.A., J.B.).
Am J Ophthalmol. 2024 Jun;262:153-160. doi: 10.1016/j.ajo.2024.01.030. Epub 2024 Feb 1.
Nearly all published ophthalmology-related Big Data studies rely exclusively on International Classification of Diseases (ICD) billing codes to identify patients with particular ocular conditions. However, inaccurate or nonspecific codes may be used. We assessed whether natural language processing (NLP), as an alternative approach, could more accurately identify lens pathology.
Database study comparing the accuracy of NLP versus ICD billing codes to properly identify lens pathology.
We developed an NLP algorithm capable of searching free-text lens exam data in the electronic health record (EHR) to identify the type(s) of cataract present, cataract density, presence of intraocular lenses, and other lens pathology. We applied our algorithm to 17.5 million lens exam records in the Sight Outcomes Research Collaborative (SOURCE) repository. We selected 4314 unique lens-exam entries and asked 11 clinicians to assess whether all pathology present in the entries had been correctly identified in the NLP algorithm output. The algorithm's sensitivity at accurately identifying lens pathology was compared with that of the ICD codes.
The NLP algorithm correctly identified all lens pathology present in 4104 of the 4314 lens-exam entries (95.1%). For less common lens pathology, algorithm findings were corroborated by reviewing clinicians for 100% of mentions of pseudoexfoliation material and 99.7% for phimosis, subluxation, and synechia. Sensitivity at identifying lens pathology was better for NLP (0.98 [0.96-0.99] than for billing codes (0.49 [0.46-0.53]).
Our NLP algorithm identifies and classifies lens abnormalities routinely documented by eye-care professionals with high accuracy. Such algorithms will help researchers to properly identify and classify ocular pathology, broadening the scope of feasible research using real-world data.
几乎所有已发表的眼科相关大数据研究都仅依赖国际疾病分类(ICD)计费代码来识别患有特定眼部疾病的患者。然而,计费代码可能存在不准确或不明确的情况。我们评估了自然语言处理(NLP)作为替代方法是否可以更准确地识别晶状体病变。
比较 NLP 与 ICD 计费代码准确性以正确识别晶状体病变的数据库研究。
我们开发了一种 NLP 算法,能够在电子健康记录(EHR)中搜索自由文本晶状体检查数据,以识别存在的白内障类型、白内障密度、人工晶状体的存在以及其他晶状体病变。我们将我们的算法应用于 SOURCE 存储库中的 1750 万份晶状体检查记录。我们选择了 4314 个独特的晶状体检查条目,并要求 11 名临床医生评估条目内的所有病理是否都在 NLP 算法输出中得到正确识别。比较了算法识别晶状体病变的准确性与 ICD 代码的准确性。
NLP 算法正确识别了 4314 个晶状体检查条目中的 4104 个(95.1%)存在的所有晶状体病变。对于不太常见的晶状体病变,对于假剥脱物质的提及,算法结果得到了 100%的临床医生的证实,对于 99.7%的病例,对于膜性外翻、脱位和粘连的提及,算法结果也得到了证实。识别晶状体病变的敏感性方面,NLP(0.98 [0.96-0.99])优于计费代码(0.49 [0.46-0.53])。
我们的 NLP 算法以高精度识别和分类眼科医生常规记录的晶状体异常。此类算法将帮助研究人员正确识别和分类眼部病变,扩大使用真实世界数据进行可行研究的范围。