Musselman Reilly P, Rothwell Deanna, Auer Rebecca C, Moloo Husein, Boushey Robin P, van Walraven Carl
Division of General Surgery, University of Ottawa, Ottawa, ON, Canada.
Department Epidemiology and Community Medicine, Ottawa Hospital Research Institute, Ottawa, ON, Canada.
J Pathol Inform. 2018 May 2;9:18. doi: 10.4103/jpi.jpi_71_17. eCollection 2018.
The aim of this study is to derive and to validate a cohort of rectal cancer surgical patients within administrative datasets using text-search analysis of pathology reports.
A text-search algorithm was developed and validated on pathology reports from 694 known rectal cancers, 1000 known colon cancers, and 1000 noncolorectal specimens. The algorithm was applied to all pathology reports available within the Ottawa Hospital Data Warehouse from 1996 to 2010. Identified pathology reports were validated as rectal cancer specimens through manual chart review. Sensitivity, specificity, and positive predictive value (PPV) of the text-search methodology were calculated.
In the derivation cohort of pathology reports ( = 2694), the text-search algorithm had a sensitivity and specificity of 100% and 98.6%, respectively. When this algorithm was applied to all pathology reports from 1996 to 2010 ( = 284,032), 5588 pathology reports were identified as consistent with rectal cancer. Medical record review determined that 4550 patients did not have rectal cancer, leaving a final cohort of 1038 rectal cancer patients. Sensitivity and specificity of the text-search algorithm were 100% and 98.4%, respectively. PPV of the algorithm was 18.6%.
Text-search methodology is a feasible way to identify all rectal cancer surgery patients through administrative datasets with high sensitivity and specificity. However, in the presence of a low pretest probability, text-search methods must be combined with a validation method, such as manual chart review, to be a viable approach.
本研究的目的是通过对病理报告进行文本搜索分析,在管理数据集中推导并验证一组直肠癌手术患者。
开发了一种文本搜索算法,并在694例已知直肠癌、1000例已知结肠癌和1000例非结直肠标本的病理报告上进行验证。该算法应用于渥太华医院数据仓库1996年至2010年期间所有可用的病理报告。通过人工病历审查将识别出的病理报告验证为直肠癌标本。计算文本搜索方法的敏感性、特异性和阳性预测值(PPV)。
在病理报告推导队列(n = 2694)中,文本搜索算法的敏感性和特异性分别为100%和98.6%。当该算法应用于1996年至2010年的所有病理报告(n = 284,032)时,5588份病理报告被识别为与直肠癌一致。病历审查确定4550例患者没有直肠癌,最终队列中有1038例直肠癌患者。文本搜索算法的敏感性和特异性分别为100%和98.4%。该算法的PPV为18.6%。
文本搜索方法是通过管理数据集以高敏感性和特异性识别所有直肠癌手术患者的可行方法。然而,在预测试概率较低的情况下,文本搜索方法必须与验证方法(如人工病历审查)相结合,才能成为一种可行的方法。