Department of Computer Science, Colorado School of Mines, Golden, CO 80401, USA.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i92-i100. doi: 10.1093/bioinformatics/btac267.
Breast cancer is a type of cancer that develops in breast tissues, and, after skin cancer, it is the most commonly diagnosed cancer in women in the United States. Given that an early diagnosis is imperative to prevent breast cancer progression, many machine learning models have been developed in recent years to automate the histopathological classification of the different types of carcinomas. However, many of them are not scalable to large-scale datasets.
In this study, we propose the novel Primal-Dual Multi-Instance Support Vector Machine to determine which tissue segments in an image exhibit an indication of an abnormality. We derive an efficient optimization algorithm for the proposed objective by bypassing the quadratic programming and least-squares problems, which are commonly employed to optimize Support Vector Machine models. The proposed method is computationally efficient, thereby it is scalable to large-scale datasets. We applied our method to the public BreaKHis dataset and achieved promising prediction performance and scalability for histopathological classification.
Software is publicly available at: https://1drv.ms/u/s!AiFpD21bgf2wgRLbQq08ixD0SgRD?e=OpqEmY.
Supplementary data are available at Bioinformatics online.
乳腺癌是一种发生在乳腺组织中的癌症,在美国,它是女性中仅次于皮肤癌的最常见癌症。鉴于早期诊断对于防止乳腺癌进展至关重要,近年来已经开发出许多机器学习模型来自动对不同类型的癌进行组织病理学分类。然而,其中许多模型不适用于大规模数据集。
在这项研究中,我们提出了一种新颖的原始对偶多实例支持向量机,用于确定图像中的哪些组织片段表现出异常迹象。我们通过绕过通常用于优化支持向量机模型的二次规划和最小二乘问题,为所提出的目标导出了一种有效的优化算法。所提出的方法计算效率高,因此可扩展到大规模数据集。我们将该方法应用于公共的 BreaKHis 数据集,并在组织病理学分类方面取得了有前景的预测性能和可扩展性。
软件可在以下网址获得:https://1drv.ms/u/s!AiFpD21bgf2wgRLbQq08ixD0SgRD?e=OpqEmY。
补充数据可在生物信息学在线获得。