Pan Shi, Secrier Maria
Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London WC1E 6BT, UK.
iScience. 2023 Sep 27;26(10):108073. doi: 10.1016/j.isci.2023.108073. eCollection 2023 Oct 20.
Hematoxylin and eosin (H&E) stained slides are widely used in disease diagnosis. Remarkable advances in deep learning have made it possible to detect complex molecular patterns in these histopathology slides, suggesting automated approaches could help inform pathologists' decisions. Multiple instance learning (MIL) algorithms have shown promise in this context, outperforming transfer learning (TL) methods for various tasks, but their implementation and usage remains complex. We introduce HistoMIL, a Python package designed to streamline the implementation, training and inference process of MIL-based algorithms for computational pathologists and biomedical researchers. It integrates a self-supervised learning module for feature encoding, and a full pipeline encompassing TL and three MIL algorithms: ABMIL, DSMIL, and TransMIL. The PyTorch Lightning framework enables effortless customization and algorithm implementation. We illustrate HistoMIL's capabilities by building predictive models for 2,487 cancer hallmark genes on breast cancer histology slides, achieving AUROC performances of up to 85%.
苏木精和伊红(H&E)染色切片在疾病诊断中被广泛应用。深度学习的显著进展使得在这些组织病理学切片中检测复杂分子模式成为可能,这表明自动化方法有助于为病理学家的决策提供信息。多实例学习(MIL)算法在这方面已显示出前景,在各种任务中优于迁移学习(TL)方法,但其实现和使用仍然复杂。我们引入了HistoMIL,这是一个Python包,旨在为计算病理学家和生物医学研究人员简化基于MIL算法的实现、训练和推理过程。它集成了一个用于特征编码的自监督学习模块,以及一个包含TL和三种MIL算法(ABMIL、DSMIL和TransMIL)的完整管道。PyTorch Lightning框架使定制和算法实现变得轻松。我们通过在乳腺癌组织学切片上为2487个癌症标志基因构建预测模型来说明HistoMIL的能力,实现了高达85%的受试者工作特征曲线下面积(AUROC)性能。