Nam Seung-Joo, Moon Gwiseong, Park Jung-Hwan, Kim Yoon, Lim Yun Jeong, Choi Hyun-Soo
Division of Gastroenterology and Hepatology, Department of Internal Medicine, Kangwon National University School of Medicine, Chuncheon 24341, Republic of Korea.
Ziovision Co., Ltd., Chuncheon 24341, Republic of Korea.
Biomedicines. 2024 Jul 31;12(8):1704. doi: 10.3390/biomedicines12081704.
Wireless capsule endoscopy (WCE) has significantly advanced the diagnosis of gastrointestinal (GI) diseases by allowing for the non-invasive visualization of the entire small intestine. However, machine learning-based methods for organ classification in WCE often rely on color information, leading to decreased performance when obstacles such as food debris are present. This study proposes a novel model that integrates convolutional neural networks (CNNs) and long short-term memory (LSTM) networks to analyze multiple frames and incorporate temporal information, ensuring that it performs well even when visual information is limited.
We collected data from 126 patients using PillCam™ SB3 (Medtronic, Minneapolis, MN, USA), which comprised 2,395,932 images. Our deep learning model was trained to identify organs (stomach, small intestine, and colon) using data from 44 training and 10 validation cases. We applied calibration using a Gaussian filter to enhance the accuracy of detecting organ boundaries. Additionally, we estimated the transit time of the capsule in the gastric and small intestine regions using a combination of a convolutional neural network (CNN) and a long short-term memory (LSTM) designed to be aware of the sequence information of continuous videos. Finally, we evaluated the model's performance using WCE videos from 72 patients.
Our model demonstrated high performance in organ classification, achieving an accuracy, sensitivity, and specificity of over 95% for each organ (stomach, small intestine, and colon), with an overall accuracy and F1-score of 97.1%. The Matthews Correlation Coefficient (MCC) and Geometric Mean (G-mean) were used to evaluate the model's performance on imbalanced datasets, achieving MCC values of 0.93 for the stomach, 0.91 for the small intestine, and 0.94 for the colon, and G-mean values of 0.96 for the stomach, 0.95 for the small intestine, and 0.97 for the colon. Regarding the estimation of gastric and small intestine transit times, the mean time differences between the model predictions and ground truth were 4.3 ± 9.7 min for the stomach and 24.7 ± 33.8 min for the small intestine. Notably, the model's predictions for gastric transit times were within 15 min of the ground truth for 95.8% of the test dataset (69 out of 72 cases). The proposed model shows overall superior performance compared to a model using only CNN.
The combination of CNN and LSTM proves to be both accurate and clinically effective for organ classification and transit time estimation in WCE. Our model's ability to integrate temporal information allows it to maintain high performance even in challenging conditions where color information alone is insufficient. Including MCC and G-mean metrics further validates the robustness of our approach in handling imbalanced datasets. These findings suggest that the proposed method can significantly improve the diagnostic accuracy and efficiency of WCE, making it a valuable tool in clinical practice for diagnosing and managing GI diseases.
无线胶囊内镜(WCE)通过实现对整个小肠的非侵入性可视化,极大地推动了胃肠道(GI)疾病的诊断。然而,基于机器学习的WCE器官分类方法通常依赖颜色信息,当存在食物残渣等障碍物时,性能会下降。本研究提出了一种新颖的模型,该模型集成了卷积神经网络(CNN)和长短期记忆(LSTM)网络,以分析多个帧并纳入时间信息,确保即使在视觉信息有限的情况下也能表现良好。
我们使用PillCam™ SB3(美敦力公司,明尼阿波利斯,明尼苏达州,美国)从126名患者收集数据,共2395932张图像。我们的深度学习模型使用来自44个训练病例和10个验证病例的数据进行训练,以识别器官(胃、小肠和结肠)。我们使用高斯滤波器进行校准,以提高检测器官边界的准确性。此外,我们使用一个旨在了解连续视频序列信息的卷积神经网络(CNN)和长短期记忆(LSTM)的组合,估计胶囊在胃和小肠区域的传输时间。最后,我们使用72名患者的WCE视频评估模型的性能。
我们的模型在器官分类方面表现出高性能,每个器官(胃、小肠和结肠)的准确率、灵敏度和特异性均超过95%,总体准确率和F1分数为97.1%。马修斯相关系数(MCC)和几何均值(G-均值)用于评估模型在不平衡数据集上的性能,胃的MCC值为0.93,小肠为0.91,结肠为0.94,胃的G-均值为0.96,小肠为0.95,结肠为0.97。关于胃和小肠传输时间的估计,模型预测与地面真值之间的平均时间差,胃为4.3±9.7分钟,小肠为24.7±33.8分钟。值得注意的是,对于95.8%的测试数据集(72例中的69例),模型对胃传输时间的预测在地面真值的15分钟内。与仅使用CNN的模型相比,所提出的模型总体表现更优。
CNN和LSTM的组合在WCE的器官分类和传输时间估计方面被证明既准确又具有临床有效性。我们的模型整合时间信息的能力使其即使在仅颜色信息不足的具有挑战性的条件下也能保持高性能。纳入MCC和G-均值指标进一步验证了我们的方法在处理不平衡数据集方面的稳健性。这些发现表明,所提出的方法可以显著提高WCE的诊断准确性和效率,使其成为临床实践中诊断和管理GI疾病的有价值工具。