Houston VA HSR&D Center of Excellence, Michael E. DeBakey Veterans Affairs Medical Center, Houston, TX, USA.
Dig Dis Sci. 2013 Apr;58(4):936-41. doi: 10.1007/s10620-012-2433-8. Epub 2012 Oct 21.
Differentiating surveillance from non-surveillance colonoscopy for colorectal cancer in patients with inflammatory bowel disease (IBD) using electronic medical records (EMR) is important for practice improvement and research purposes, but diagnosis code algorithms are lacking. The automated retrieval console (ARC) is natural language processing (NLP)-based software that allows text-based document-level classification.
The purpose of this study was to test the feasibility and accuracy of ARC in identifying surveillance and non-surveillance colonoscopy in IBD using EMR.
We performed a split validation study of electronic reports of colonoscopy pathology for patients with IBD from the Michael E. DeBakey VA Medical Center. A gastroenterologist manually classified pathology reports as either derived from surveillance or non-surveillance colonoscopy. Pathology reports were randomly split into two sets: 70 % for algorithm derivation and 30 % for validation. An ARC generated classification model was applied to the validation set of pathology reports. The performance of the model was compared with manual classification for surveillance and non-surveillance colonoscopy.
A total of 575 colonoscopy pathology reports were available on 195 IBD patients, of which 400 reports were designated as training and 175 as testing sets. Within the testing set, a total of 69 pathology reports were classified as surveillance by manual review, whereas the ARC model classified 66 reports as surveillance for a recall of 0.77, precision of 0.80, and specificity of 0.88.
ARC was able to identify surveillance colonoscopy for IBD without customized software programming. NLP-based document-level classification may be used to differentiate surveillance from non-surveillance colonoscopy in IBD.
在炎症性肠病(IBD)患者中,使用电子病历(EMR)区分结直肠癌的监测性与非监测性结肠镜检查对于改善实践和研究目的非常重要,但缺乏诊断代码算法。自动化检索控制台(ARC)是一种基于自然语言处理(NLP)的软件,允许基于文本的文档级分类。
本研究旨在测试 ARC 利用 EMR 识别 IBD 患者监测性与非监测性结肠镜检查的可行性和准确性。
我们对迈克尔·E·德贝基退伍军人事务医疗中心的 IBD 患者的结肠镜检查病理电子报告进行了拆分验证研究。一名胃肠病学家手动将病理报告分类为来源于监测性或非监测性结肠镜检查。病理报告随机分为两组:70%用于算法推导,30%用于验证。将 ARC 生成的分类模型应用于验证集的病理报告。将模型的性能与手动分类的监测性和非监测性结肠镜检查进行比较。
共有 195 例 IBD 患者的 575 份结肠镜检查病理报告可用,其中 400 份报告被指定为训练集,175 份报告为测试集。在测试集中,共有 69 份病理报告被手动审查归类为监测性,而 ARC 模型将 66 份报告归类为监测性,召回率为 0.77,精确率为 0.80,特异性为 0.88。
ARC 无需定制软件编程即可识别 IBD 的监测性结肠镜检查。基于 NLP 的文档级分类可用于区分 IBD 中的监测性与非监测性结肠镜检查。