Wu Wenjun, Li Beibin, Mercan Ezgi, Mehta Sachin, Bartlett Jamen, Weaver Donald L, Elmore Joann G, Shapiro Linda G
Department of Medical Education and Biomedical Informatics, University of Washington, Seattle, WA.
Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA.
JCO Clin Cancer Inform. 2020 Mar;4:290-298. doi: 10.1200/CCI.19.00129.
Machine Learning Package for Cancer Diagnosis (MLCD) is the result of a National Institutes of Health/National Cancer Institute (NIH/NCI)-sponsored project for developing a unified software package from state-of-the-art breast cancer biopsy diagnosis and machine learning algorithms that can improve the quality of both clinical practice and ongoing research.
Whole-slide images of 240 well-characterized breast biopsy cases, initially assembled under R01 CA140560, were used for developing the algorithms and training the machine learning models. This software package is based on the methodology developed and published under our recent NIH/NCI-sponsored research grant (R01 CA172343) for finding regions of interest (ROIs) in whole-slide breast biopsy images, for segmenting ROIs into histopathologic tissue types and for using this segmentation in classifiers that can suggest final diagnoses.
The package provides an ROI detector for whole-slide images and modules for semantic segmentation into tissue classes and diagnostic classification into 4 classes (benign, atypia, ductal carcinoma in situ, invasive cancer) of the ROIs. It is available through the GitHub repository under the Massachusetts Institute of Technology license and will later be distributed with the Pathology Image Informatics Platform system. A Web page provides instructions for use.
Our tools have the potential to provide help to other cancer researchers and, ultimately, to practicing physicians and will motivate future research in this field. This article describes the methodology behind the software development and gives sample outputs to guide those interested in using this package.
癌症诊断机器学习软件包(MLCD)是美国国立卫生研究院/国立癌症研究所(NIH/NCI)资助项目的成果,该项目旨在基于最先进的乳腺癌活检诊断和机器学习算法开发一个统一的软件包,以提高临床实践和正在进行的研究的质量。
最初在R01 CA140560项目下收集的240例特征明确的乳腺活检病例的全切片图像,用于开发算法和训练机器学习模型。该软件包基于我们最近由NIH/NCI资助的研究基金(R01 CA172343)所开发并发表的方法,用于在全切片乳腺活检图像中找到感兴趣区域(ROI),将ROI分割为组织病理学组织类型,并在能够给出最终诊断建议的分类器中使用这种分割方法。
该软件包为全切片图像提供了一个ROI检测器,以及用于将组织类别的语义分割和将ROI诊断分类为4类(良性、非典型性、原位导管癌、浸润性癌)的模块。它可通过GitHub仓库在麻省理工学院许可下获取,稍后将与病理图像信息学平台系统一起分发。一个网页提供了使用说明。
我们的工具有可能为其他癌症研究人员提供帮助,并最终为执业医师提供帮助,还将推动该领域未来的研究。本文描述了软件开发背后的方法,并给出了示例输出,以指导有兴趣使用此软件包的人员。