LaTIM, INSERM, Université de Bretagne-Occidentale, Brest, France.
LARIS, Université d'Angers, Angers, France.
Sci Rep. 2023 Nov 16;13(1):20014. doi: 10.1038/s41598-023-46239-0.
This study aims to develop a robust pipeline for classifying invasive ductal carcinomas and benign tumors in histopathological images, addressing variability within and between centers. We specifically tackle the challenge of detecting atypical data and variability between common clusters within the same database. Our feature engineering-based pipeline comprises a feature extraction step, followed by multiple harmonization techniques to rectify intra- and inter-center batch effects resulting from image acquisition variability and diverse patient clinical characteristics. These harmonization steps facilitate the construction of more robust and efficient models. We assess the proposed pipeline's performance on two public breast cancer databases, BreaKHIS and IDCDB, utilizing recall, precision, and accuracy metrics. Our pipeline outperforms recent models, achieving 90-95% accuracy in classifying benign and malignant tumors. We demonstrate the advantage of harmonization for classifying patches from different databases. Our top model scored 94.7% for IDCDB and 95.2% for BreaKHis, surpassing existing feature engineering-based models (92.1% for IDCDB and 87.7% for BreaKHIS) and attaining comparable performance to deep learning models. The proposed feature-engineering-based pipeline effectively classifies malignant and benign tumors while addressing variability within and between centers through the incorporation of various harmonization techniques. Our findings reveal that harmonizing variabilities between patches from different batches directly impacts the learning and testing performance of classification models. This pipeline has the potential to enhance breast cancer diagnosis and treatment and may be applicable to other diseases.
本研究旨在开发一种稳健的管道,用于对组织病理学图像中的浸润性导管癌和良性肿瘤进行分类,解决中心内和中心间的变异性问题。我们特别解决了检测异常数据和同一数据库中常见聚类之间变异性的挑战。我们的基于特征工程的管道包括特征提取步骤,以及多个协调技术,以纠正由于图像采集变异性和不同患者临床特征导致的中心内和中心间批次效应。这些协调步骤有助于构建更稳健和高效的模型。我们在两个公共乳腺癌数据库(BreaKHIS 和 IDCDB)上评估了所提出的管道的性能,利用召回率、精度和准确性指标。我们的管道在良性和恶性肿瘤分类方面的性能优于最近的模型,达到了 90-95%的准确率。我们展示了协调在分类来自不同数据库的斑块方面的优势。我们的顶级模型在 IDCDB 上的得分达到 94.7%,在 BreaKHis 上的得分达到 95.2%,超过了现有的基于特征工程的模型(在 IDCDB 上的得分达到 92.1%,在 BreaKHIS 上的得分达到 87.7%),并达到了与深度学习模型相当的性能。所提出的基于特征工程的管道通过整合各种协调技术,有效地对良性和恶性肿瘤进行分类,同时解决了中心内和中心间的变异性问题。我们的研究结果表明,协调来自不同批次的斑块之间的变异性直接影响分类模型的学习和测试性能。该管道有可能增强乳腺癌的诊断和治疗,并可能适用于其他疾病。