Penpong Nutsuda, Wanna Yupaporn, Kamjanlard Cristakan, Techasen Anchalee, Intharah Thanapong
Visual Intelligence Laboratory, Department of Statistics, Faculty of Science, Khon Kaen University, Khon Kaen, Thailand.
Cholangiocarcinoma Research Institute, Khon Kaen University, Khon Kaen, Thailand.
Heliyon. 2024 Feb 13;10(4):e26153. doi: 10.1016/j.heliyon.2024.e26153. eCollection 2024 Feb 29.
The out-of-domain (OO-Do) problem has hindered machine learning models especially when the models are deployed in the real world. The OO-Do problem occurs during machine learning testing phase when a learned machine learning model must predict on data belonging to a class that is different from that of the data used for training. We tackle the OO-Do problem in an object-detection task: a parasite-egg detection model used in real-world situations. First, we introduce the In-the-wild parasite-egg dataset to evaluate the OO-Do-aware model. The dataset contains 1,552 images, 1,049 parasite-egg, and 503 OO-Do images, uploaded through chatbot. It was constructed by conducting a chatbot test session with 222 medical technology students. Thereafter, we propose a data-driven framework to construct a parasite-egg recognition model for in-the-wild applications to address the OO-Do issue. In the framework, we use publicly available datasets to train the parasite-egg recognition models about in-domain and out-of-domain concepts. Finally, we compare the integration strategies for our proposed two-step parasite-egg detection approach on two test sets: standard and In-the-wild datasets. We also investigate different thresholding strategies for model robustness to OO-Do data. Experiments on two test datasets showed that concatenating an OO-Do-aware classification model after an object-detection model achieved outstanding performance in detecting parasite eggs. The framework gained 7.37% and 4.09% F1-score improvement from the baselines on Chula +Wild dataset and the In-the-wild parasite-egg dataset, respectively.
域外(OO-Do)问题阻碍了机器学习模型的发展,尤其是当这些模型部署到现实世界中时。OO-Do问题出现在机器学习测试阶段,即当一个经过训练的机器学习模型必须对属于与训练数据不同类别的数据进行预测时。我们在一个目标检测任务中解决OO-Do问题:一个在现实世界中使用的寄生虫卵检测模型。首先,我们引入野外寄生虫卵数据集来评估具有OO-Do意识的模型。该数据集包含1552张图像、1049个寄生虫卵和503张OO-Do图像,这些图像是通过聊天机器人上传的。它是通过与222名医学技术专业学生进行聊天机器人测试会话构建的。此后,我们提出了一个数据驱动的框架,以构建用于野外应用的寄生虫卵识别模型,以解决OO-Do问题。在该框架中,我们使用公开可用的数据集来训练寄生虫卵识别模型关于域内和域外概念。最后,我们在两个测试集上比较了我们提出的两步寄生虫卵检测方法的集成策略:标准数据集和野外数据集。我们还研究了不同的阈值策略以提高模型对OO-Do数据的鲁棒性。在两个测试数据集上的实验表明,在目标检测模型之后连接一个具有OO-Do意识的分类模型在检测寄生虫卵方面取得了出色的性能。该框架在Chula +Wild数据集和野外寄生虫卵数据集上分别比基线提高了7.37%和4.09%的F1分数。