Kim Hee E, Maros Mate E, Siegel Fabian, Ganslandt Thomas
Department of Biomedical Informatics, Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany.
Chair of Medical Informatics, Friedrich-Alexander-Universtät Erlangen-Nürmberg, 91054 Erlangen, Germany.
Biomedicines. 2022 Nov 4;10(11):2808. doi: 10.3390/biomedicines10112808.
Despite the emergence of mobile health and the success of deep learning (DL), deploying production-ready DL models to resource-limited devices remains challenging. Especially, during inference time, the speed of DL models becomes relevant. We aimed to accelerate inference time for Gram-stained analysis, which is a tedious and manual task involving microorganism detection on whole slide images. Three DL models were optimized in three steps: transfer learning, pruning and quantization and then evaluated on two Android smartphones. Most convolutional layers (≥80%) had to be retrained for adaptation to the Gram-stained classification task. The combination of pruning and quantization demonstrated its utility to reduce the model size and inference time without compromising model quality. Pruning mainly contributed to model size reduction by 15×, while quantization reduced inference time by 3× and decreased model size by 4×. The combination of two reduced the baseline model by an overall factor of 46×. Optimized models were smaller than 6 MB and were able to process one image in <0.6 s on a Galaxy S10. Our findings demonstrate that methods for model compression are highly relevant for the successful deployment of DL solutions to resource-limited devices.
尽管移动健康领域不断发展且深度学习(DL)取得了成功,但将可投入生产的DL模型部署到资源有限的设备上仍然具有挑战性。特别是在推理阶段,DL模型的速度变得至关重要。我们旨在加快革兰氏染色分析的推理速度,这是一项繁琐的人工任务,涉及在全玻片图像上检测微生物。三个DL模型分三步进行了优化:迁移学习、剪枝和量化,然后在两部安卓智能手机上进行了评估。大多数卷积层(≥80%)必须重新训练以适应革兰氏染色分类任务。剪枝和量化的结合证明了其在不影响模型质量的情况下减少模型大小和推理时间的效用。剪枝主要使模型大小减少了15倍,而量化使推理时间减少了3倍,模型大小减少了4倍。两者结合使基线模型整体减少了46倍。优化后的模型小于6MB,在三星Galaxy S10上能够在<0.6秒内处理一张图像。我们的研究结果表明,模型压缩方法对于将DL解决方案成功部署到资源有限的设备上至关重要。