Lacson Ronilda, Harris Kimberly, Brawarsky Phyllis, Tosteson Tor D, Onega Tracy, Tosteson Anna N A, Kaye Abby, Gonzalez Irina, Birdwell Robyn, Haas Jennifer S
Department of Radiology, Brigham and Women's Hospital, 75 Francis Street, Boston, MA, 02115, USA.
Harvard Medical School, Boston, MA, USA.
J Digit Imaging. 2015 Oct;28(5):567-75. doi: 10.1007/s10278-014-9762-4.
Breast cancer screening is central to early breast cancer detection. Identifying and monitoring process measures for screening is a focus of the National Cancer Institute's Population-based Research Optimizing Screening through Personalized Regimens (PROSPR) initiative, which requires participating centers to report structured data across the cancer screening continuum. We evaluate the accuracy of automated information extraction of imaging findings from radiology reports, which are available as unstructured text. We present prevalence estimates of imaging findings for breast imaging received by women who obtained care in a primary care network participating in PROSPR (n = 139,953 radiology reports) and compared automatically extracted data elements to a "gold standard" based on manual review for a validation sample of 941 randomly selected radiology reports, including mammograms, digital breast tomosynthesis, ultrasound, and magnetic resonance imaging (MRI). The prevalence of imaging findings vary by data element and modality (e.g., suspicious calcification noted in 2.6% of screening mammograms, 12.1% of diagnostic mammograms, and 9.4% of tomosynthesis exams). In the validation sample, the accuracy of identifying imaging findings, including suspicious calcifications, masses, and architectural distortion (on mammogram and tomosynthesis); masses, cysts, non-mass enhancement, and enhancing foci (on MRI); and masses and cysts (on ultrasound), range from 0.8 to1.0 for recall, precision, and F-measure. Information extraction tools can be used for accurate documentation of imaging findings as structured data elements from text reports for a variety of breast imaging modalities. These data can be used to populate screening registries to help elucidate more effective breast cancer screening processes.
乳腺癌筛查是早期乳腺癌检测的核心。识别和监测筛查的过程指标是美国国家癌症研究所“通过个性化方案优化基于人群的筛查(PROSPR)”倡议的重点,该倡议要求参与的中心报告癌症筛查全过程的结构化数据。我们评估了从放射学报告中自动提取影像学检查结果的准确性,这些报告以非结构化文本形式存在。我们给出了参与PROSPR的初级保健网络中接受护理的女性所进行的乳腺影像学检查结果的患病率估计(n = 139,953份放射学报告),并将自动提取的数据元素与基于对941份随机选择的放射学报告(包括乳房X线摄影、数字乳腺断层合成、超声和磁共振成像(MRI))进行人工审核的“金标准”进行比较。影像学检查结果的患病率因数据元素和检查方式而异(例如,在2.6%的筛查乳房X线摄影、12.1%的诊断性乳房X线摄影和9.4%的断层合成检查中发现可疑钙化)。在验证样本中,识别影像学检查结果的准确性,包括可疑钙化、肿块和结构扭曲(在乳房X线摄影和断层合成上);肿块、囊肿、非肿块强化和强化灶(在MRI上);以及肿块和囊肿(在超声上),召回率、精确率和F值范围为0.8至1.0。信息提取工具可用于将影像学检查结果作为结构化数据元素从文本报告中准确记录下来,用于各种乳腺成像方式。这些数据可用于填充筛查登记库,以帮助阐明更有效的乳腺癌筛查过程。