Yang Lijuan, Dong Qiumei, Lin Da, Lü Xinliang
Department of Rheumatology, Inner Mongolia Autonomous Region Hospital of Traditional Chinese Medicine, Hohhot, China.
College of Traditional Chinese Medicine, Inner Mongolia Medical University, Hohhot, China.
Front Physiol. 2025 Apr 25;16:1527751. doi: 10.3389/fphys.2025.1527751. eCollection 2025.
Tongue diagnosis in Traditional Chinese Medicine (TCM) plays a crucial role in clinical practice. By observing the shape, color, and coating of the tongue, practitioners can assist in determining the nature and location of a disease. However, the field of tongue diagnosis currently faces challenges such as data scarcity and a lack of efficient multimodal diagnostic models, making it difficult to fully align with TCM theories and clinical needs. Additionally, existing methods generally lack multi-label classification capabilities, making it challenging to simultaneously meet the multidimensional requirements of TCM diagnosis for disease nature and location. To address these issues, this paper proposes TongueNet, a multimodal deep learning model that integrates tongue image data with text-based features. The model utilizes a Hierarchical Aggregation Network (HAN) and a Feature Space Projection Module to efficiently extract and fuse features while introducing consistency and complementarity constraints to optimize multimodal information fusion. Furthermore, the model incorporates a multi-scale attention mechanism (EMA) to enhance the diversity and accuracy of feature weighting and employs a Kolmogorov-Arnold Network (KAN) instead of traditional MLPs for output optimization, thereby improving the representation of complex features. For model training, this study integrates three publicly available tongue image datasets from the Roboflow platform and enlists multiple experts for multimodal annotation, incorporating multi-label information on disease nature and location to align with TCM clinical needs. Experimental results demonstrate that TongueNet outperforms existing models in both disease nature and disease location classification tasks. Specifically, in the disease nature classification task, it achieves 89.12% accuracy and an AUC of 83%; in the disease location classification task, it achieves 86.47% accuracy and an AUC of 81%. Moreover, TongueNet contains only 32.1 M parameters, significantly reducing computational resource requirements while maintaining high diagnostic performance. TongueNet provides a new approach for the intelligent development of TCM tongue diagnosis.
中医舌诊在临床实践中起着至关重要的作用。通过观察舌头的形状、颜色和舌苔,从业者可以辅助判断疾病的性质和部位。然而,目前舌诊领域面临数据稀缺以及缺乏高效多模态诊断模型等挑战,难以完全契合中医理论和临床需求。此外,现有方法普遍缺乏多标签分类能力,难以同时满足中医诊断对疾病性质和部位的多维度要求。为解决这些问题,本文提出了TongueNet,一种将舌图像数据与基于文本的特征相整合的多模态深度学习模型。该模型利用分层聚合网络(HAN)和特征空间投影模块来高效提取和融合特征,同时引入一致性和互补性约束以优化多模态信息融合。此外,该模型融入了多尺度注意力机制(EMA)以增强特征加权的多样性和准确性,并采用柯尔莫哥洛夫 - 阿诺德网络(KAN)代替传统多层感知器进行输出优化,从而改善复杂特征的表示。对于模型训练,本研究整合了来自Roboflow平台的三个公开可用舌图像数据集,并邀请多位专家进行多模态标注,纳入疾病性质和部位的多标签信息以符合中医临床需求。实验结果表明,TongueNet在疾病性质和疾病部位分类任务中均优于现有模型。具体而言,在疾病性质分类任务中,其准确率达到89.12%,曲线下面积(AUC)为83%;在疾病部位分类任务中,其准确率达到86.47%,AUC为81%。此外,TongueNet仅包含3210万个参数,在保持高诊断性能的同时显著降低了计算资源需求。TongueNet为中医舌诊的智能化发展提供了一种新方法。