Soman Karthik, Nelson Charlotte A, Cerono Gabriel, Goldman Samuel M, Baranzini Sergio E, Brown Ethan G
Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, United States.
Division of Occupational and Environmental Medicine, University of California, San Francisco, San Francisco, CA, United States.
Front Med (Lausanne). 2023 May 12;10:1081087. doi: 10.3389/fmed.2023.1081087. eCollection 2023.
Early diagnosis of Parkinson's disease (PD) is important to identify treatments to slow neurodegeneration. People who develop PD often have symptoms before the disease manifests and may be coded as diagnoses in the electronic health record (EHR).
To predict PD diagnosis, we embedded EHR data of patients onto a biomedical knowledge graph called Scalable Precision medicine Open Knowledge Engine (SPOKE) and created patient embedding vectors. We trained and validated a classifier using these vectors from 3,004 PD patients, restricting records to 1, 3, and 5 years before diagnosis, and 457,197 non-PD group.
The classifier predicted PD diagnosis with moderate accuracy (AUC = 0.77 ± 0.06, 0.74 ± 0.05, 0.72 ± 0.05 at 1, 3, and 5 years) and performed better than other benchmark methods. Nodes in the SPOKE graph, among cases, revealed novel associations, while SPOKE patient vectors revealed the basis for individual risk classification.
The proposed method was able to explain the clinical predictions using the knowledge graph, thereby making the predictions clinically interpretable. Through enriching EHR data with biomedical associations, SPOKE may be a cost-efficient and personalized way to predict PD diagnosis years before its occurrence.
帕金森病(PD)的早期诊断对于确定减缓神经退行性变的治疗方法很重要。患帕金森病的人在疾病显现之前通常会出现症状,并且可能在电子健康记录(EHR)中被编码为诊断。
为了预测帕金森病的诊断,我们将患者的电子健康记录数据嵌入到一个名为可扩展精准医学开放知识引擎(SPOKE)的生物医学知识图谱中,并创建患者嵌入向量。我们使用来自3004名帕金森病患者的这些向量训练并验证了一个分类器,将记录限制在诊断前1年、3年和5年,以及457197名非帕金森病组。
该分类器预测帕金森病诊断具有中等准确率(1年、3年和5年时的AUC分别为0.77±0.06、0.74±0.05、0.72±0.05),并且比其他基准方法表现更好。在病例中,SPOKE图谱中的节点揭示了新的关联,而SPOKE患者向量揭示了个体风险分类的基础。
所提出的方法能够使用知识图谱解释临床预测,从而使预测在临床上具有可解释性。通过用生物医学关联丰富电子健康记录数据,SPOKE可能是一种在帕金森病发生前数年预测其诊断的经济高效且个性化的方法。