多任务联合学习模型在中医分词和证候分类中的应用

Multi-Task Joint Learning Model for Chinese Word Segmentation and Syndrome Differentiation in Traditional Chinese Medicine.

机构信息

School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China.

Institute of Biomedical Engineering, School of Life Science, Shanghai University, Shanghai 200444, China.

出版信息

Int J Environ Res Public Health. 2022 May 5;19(9):5601. doi: 10.3390/ijerph19095601.

Abstract

Evidence-based treatment is the basis of traditional Chinese medicine (TCM), and the accurate differentiation of syndromes is important for treatment in this context. The automatic differentiation of syndromes of unstructured medical records requires two important steps: Chinese word segmentation and text classification. Due to the ambiguity of the Chinese language and the peculiarities of syndrome differentiation, these tasks pose a daunting challenge. We use text classification to model syndrome differentiation for TCM, and use multi-task learning (MTL) and deep learning to accomplish the two challenging tasks of Chinese word segmentation and syndrome differentiation. Two classic deep neural networks—bidirectional long short-term memory (Bi-LSTM) and text-based convolutional neural networks (TextCNN)—are fused into MTL to simultaneously carry out these two tasks. We used our proposed method to conduct a large number of comparative experiments. The experimental comparisons showed that it was superior to other methods on both tasks. Our model yielded values of accuracy, specificity, and sensitivity of 0.93, 0.94, and 0.90, and 0.80, 0.82, and 0.78 on the Chinese word segmentation task and the syndrome differentiation task, respectively. Moreover, statistical analyses showed that the accuracies of the non-joint and joint models were both within the 95% confidence interval, with pvalue < 0.05. The experimental comparison showed that our method is superior to prevalent methods on both tasks. The work here can help modernize TCM through intelligent differentiation.

摘要

循证治疗是中医(TCM)的基础,准确区分证候对于这种治疗方法非常重要。非结构化医疗记录的证候自动区分需要两个重要步骤:中文分词和文本分类。由于中文的模糊性和证候区分的特殊性,这些任务极具挑战性。我们使用文本分类对中医的证候区分进行建模,并使用多任务学习(MTL)和深度学习来完成中文分词和证候区分这两个具有挑战性的任务。我们将两个经典的深度神经网络——双向长短时记忆网络(Bi-LSTM)和基于文本的卷积神经网络(TextCNN)——融合到 MTL 中,同时进行这两个任务。我们使用提出的方法进行了大量的对比实验。实验比较表明,该方法在这两个任务上都优于其他方法。我们的模型在中文分词任务和证候区分任务上的准确率、特异性和敏感度分别为 0.93、0.94 和 0.90,以及 0.80、0.82 和 0.78。此外,统计分析表明,非联合和联合模型的准确率均在 95%置信区间内,p 值<0.05。实验比较表明,我们的方法在这两个任务上都优于流行的方法。这项工作可以通过智能区分帮助中医现代化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a47/9103751/ccf6d6c6fc9b/ijerph-19-05601-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索