IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3343-3352. doi: 10.1109/TCBB.2022.3198798. Epub 2023 Dec 25.
The automatic disease diagnosis utilizing clinical data has been suffering from the issues of feature sparse and high probability of missing values. Since the graph neural network is a effective tool to model the structural information and infer the missing values, it is becoming the dominant method for the predictive model construction from electronic medical records. Existing graph neural network based solutions usually adopt the medical concepts (e.g., symptoms) the feature representation of clinical data without considering their underlying semantic relations. The limited discriminative capability of the medical concept cannot provide sufficient indicative information about the disease. This article proposes a knowledge-guided graph attention network for the disease prediction. Beside extracting the attribute-value structure as a large-size medical concept, the mutual information between multiple medical concepts mentioned in the electronic medical records are taken into account in the graph construction. Meanwhile, the defined diseases and their associations with the medical concepts in the medical knowledge graph are incorporated into the graph, which provides the potentials to enhance the indicative impacts of the medical concepts directly related to a target disease. Then, the spatial and attention based graph encoders are employed to aggregate information from directly neighbor nodes to generate node embeddings as the compact features to be used for disease diagnosis. The approach itself is a general one that can utilized to build the predictive model using Chinese EMRs for different diseases. The empirical experiments for its performance evaluation are conducted on the real-world COPD EMR dataset. The comparison study results show that the proposed model outperforms baseline methods, which illustrates the effectiveness of our proposed model.
利用临床数据进行自动疾病诊断一直受到特征稀疏和高缺失值概率的困扰。由于图神经网络是建模结构信息和推断缺失值的有效工具,因此它正成为从电子病历构建预测模型的主导方法。现有的基于图神经网络的解决方案通常采用医疗概念(例如症状)作为临床数据的特征表示,而不考虑它们的潜在语义关系。医疗概念的有限判别能力不能为疾病提供足够的指示信息。本文提出了一种用于疾病预测的知识引导图注意网络。除了提取属性值结构作为大型医疗概念之外,还考虑了电子病历中提到的多个医疗概念之间的互信息在图的构建中。同时,将定义的疾病及其与医疗知识图谱中的医疗概念的关联纳入图中,这为增强与目标疾病直接相关的医疗概念的指示影响提供了潜力。然后,使用空间和基于注意力的图编码器从直接邻居节点聚合信息,生成作为紧凑特征用于疾病诊断的节点嵌入。该方法本身是一种通用方法,可用于使用中文 EMR 为不同疾病构建预测模型。在真实的 COPD EMR 数据集上进行了性能评估的实证实验。比较研究结果表明,所提出的模型优于基线方法,这说明了我们提出的模型的有效性。