Zeng Yan, Zhang Jinmiao
Guanganmen Hospital, China Academy of Chinese Medical Sciences, China.
Cardinal Health, Inc., USA.
Comput Biol Med. 2020 Jul;122:103861. doi: 10.1016/j.compbiomed.2020.103861. Epub 2020 Jun 13.
This study is aimed to assess the feasibility of AutoML technology for the identification of invasive ductal carcinoma (IDC) in whole slide images (WSI).
The study presents an experimental machine learning (ML) model based on Google Cloud AutoML Vision instead of a handcrafted neural network. A public dataset of 278,124 labeled histopathology images is used as the original dataset for the model creation. In order to balance the number of positive and negative IDC samples, this study also augments the original public dataset by rotating a large portion of positive image samples. As a result, a total number of 378,215 labeled images are applied.
A score of 91.6% average accuracy is achieved during the model evaluation as measured by the area under precision-recall curve (AuPRC). A subsequent test on a held-out test dataset (unseen by the model) yields a balanced accuracy of 84.6%. These results outperform the ones reported in the earlier studies. Similar performance is observed from a generalization test with new breast tissue samples we collected from the hospital.
The results obtained from this study demonstrate the maturity and feasibility of an AutoML approach for IDC identification. The study also shows the advantage of AutoML approach when combined at scale with cloud computing.
本研究旨在评估自动机器学习(AutoML)技术在全切片图像(WSI)中识别浸润性导管癌(IDC)的可行性。
该研究提出了一种基于谷歌云AutoML Vision的实验性机器学习(ML)模型,而非手工构建的神经网络。一个包含278,124张标注组织病理学图像的公共数据集被用作模型创建的原始数据集。为了平衡IDC阳性和阴性样本的数量,本研究还通过旋转大部分阳性图像样本对原始公共数据集进行了扩充。最终,共应用了378,215张标注图像。
在模型评估期间,通过精确率-召回率曲线下面积(AuPRC)测量,平均准确率达到了91.6%。随后在一个模型未见过的保留测试数据集上进行测试,平衡准确率为84.6%。这些结果优于早期研究报告的结果。从我们从医院收集的新乳腺组织样本进行的泛化测试中也观察到了类似的性能。
本研究获得的结果证明了AutoML方法用于IDC识别的成熟性和可行性。该研究还展示了AutoML方法在与云计算大规模结合时的优势。