Suppr超能文献

基于纹理和统计图像特征集成的计算机辅助乳腺癌检测

Computer Aided Breast Cancer Detection Using Ensembling of Texture and Statistical Image Features.

机构信息

Department of Metallurgical and Material Engineering, Jadavpur University, Kolkata 700032, India.

Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India.

出版信息

Sensors (Basel). 2021 May 23;21(11):3628. doi: 10.3390/s21113628.

Abstract

Breast cancer, like most forms of cancer, is a fatal disease that claims more than half a million lives every year. In 2020, breast cancer overtook lung cancer as the most commonly diagnosed form of cancer. Though extremely deadly, the survival rate and longevity increase substantially with early detection and diagnosis. The treatment protocol also varies with the stage of breast cancer. Diagnosis is typically done using histopathological slides from which it is possible to determine whether the tissue is in the Ductal Carcinoma In Situ (DCIS) stage, in which the cancerous cells have not spread into the encompassing breast tissue, or in the Invasive Ductal Carcinoma (IDC) stage, wherein the cells have penetrated into the neighboring tissues. IDC detection is extremely time-consuming and challenging for physicians. Hence, this can be modeled as an image classification task where pattern recognition and machine learning can be used to aid doctors and medical practitioners in making such crucial decisions. In the present paper, we use an IDC Breast Cancer dataset that contains 277,524 images (with 78,786 IDC positive images and 198,738 IDC negative images) to classify the images into IDC(+) and IDC(-). To that end, we use feature extractors, including textural features, such as SIFT, SURF and ORB, and statistical features, such as Haralick texture features. These features are then combined to yield a dataset of 782 features. These features are ensembled by stacking using various Machine Learning classifiers, such as Random Forest, Extra Trees, XGBoost, AdaBoost, CatBoost and Multi Layer Perceptron followed by feature selection using Pearson Correlation Coefficient to yield a dataset with four features that are then used for classification. From our experimental results, we found that CatBoost yielded the highest accuracy (92.55%), which is at par with other state-of-the-art results-most of which employ Deep Learning architectures. The source code is available in the GitHub repository.

摘要

乳腺癌与大多数癌症一样,是一种致命的疾病,每年导致超过 50 万人死亡。2020 年,乳腺癌超过肺癌成为最常见的癌症诊断形式。尽管极其致命,但早期发现和诊断可显著提高生存率和寿命。治疗方案也因乳腺癌的阶段而异。诊断通常使用组织病理学幻灯片进行,从中可以确定组织是否处于导管原位癌(DCIS)阶段,在该阶段,癌细胞尚未扩散到周围的乳腺组织中,或者处于浸润性导管癌(IDC)阶段,在该阶段,细胞已经穿透到相邻组织中。IDC 的检测对医生来说极其耗时且具有挑战性。因此,可以将其建模为图像分类任务,其中模式识别和机器学习可用于帮助医生和医疗从业者做出此类关键决策。在本文中,我们使用了一个包含 277524 张图像(78786 张 IDC 阳性图像和 198738 张 IDC 阴性图像)的 IDC 乳腺癌数据集来将图像分为 IDC(+)和 IDC(-)。为此,我们使用了特征提取器,包括纹理特征,如 SIFT、SURF 和 ORB,以及统计特征,如 Haralick 纹理特征。然后将这些特征组合起来,得到一个包含 782 个特征的数据集。然后使用各种机器学习分类器(如随机森林、ExtraTrees、XGBoost、AdaBoost、CatBoost 和多层感知机)对这些特征进行堆叠,然后使用皮尔逊相关系数进行特征选择,得到一个包含四个特征的数据集,然后用于分类。从我们的实验结果来看,CatBoost 产生了最高的准确率(92.55%),与其他最先进的结果相当——其中大多数都采用了深度学习架构。源代码可在 GitHub 存储库中找到。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验