Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA.
Institute for Computational Health Sciences, University of California, San Francisco, CA, USA.
J Digit Imaging. 2019 Feb;32(1):30-37. doi: 10.1007/s10278-018-0105-8.
Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four outcomes: left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson's automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson's NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.
乳腺癌是美国女性癌症死亡的主要原因。乳房 X 光筛查在降低死亡率方面非常有效,但也存在很高的不必要召回和活检率。虽然深度学习可以应用于乳房 X 光检查,但需要大规模的标记数据集,而这些数据集很难获得。我们旨在通过使用结合传统自然语言处理 (NLP) 和 IBM Watson 的混合框架,从现有临床记录中自动提取数据,从而消除数据集开发的许多障碍。一位专家评审员手动标记了 3521 份乳腺病理学报告,其中有四种结果:左阳性、右阳性、双侧阳性、阴性。使用七种不同的机器学习分类器比较了传统 NLP 技术和 IBM Watson 的自动化自然语言分类器。使用精度、召回率和 F 值评估技术。逻辑回归优于所有其他传统机器学习分类器,并用于随后的比较。传统 NLP 和 Watson 的 NLC 在字符数少于 1024 的情况下表现良好,所有类别的加权平均 F 值均超过 0.96。对于字符数超过 1024 的病例,传统 NLP 的性能较低,F 值为 0.83。我们展示了一种使用传统 NLP 技术与 IBM Watson 相结合的混合框架,用于注释超过 10000 份乳腺病理学报告,以开发大规模数据库,用于乳房 X 光检查中的深度学习。我们的工作表明,传统 NLP 和 IBM Watson 在字符数少于 1024 的情况下表现非常出色,并且可以加快数据注释的速度。