Alsentzer Emily, Li Michelle M, Kobren Shilpa N, Noori Ayush, Kohane Isaac S, Zitnik Marinka
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Program in Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA, USA.
NPJ Digit Med. 2025 Jun 20;8(1):380. doi: 10.1038/s41746-025-01749-1.
There are over 7000 rare diseases, some affecting 3500 or fewer patients in the United States. Due to clinicians' limited experience with such diseases and the heterogeneity of clinical presentations, ~70% of individuals seeking a diagnosis remain undiagnosed. Deep learning has demonstrated success in aiding the diagnosis of common diseases. However, existing approaches require labeled datasets with thousands of diagnosed patients per disease. We present SHEPHERD, a few-shot learning approach for multi-faceted rare disease diagnosis. SHEPHERD performs deep learning over a knowledge graph enriched with rare disease information and is trained on a dataset of simulated rare disease patients. We demonstrate SHEPHERD's effectiveness across diverse diagnostic tasks, performing causal gene discovery, retrieving "patients-like-me", and characterizing novel disease presentations, using real-world cohorts from the Undiagnosed Diseases Network (N = 465), MyGene2 (N = 146), and the Deciphering Developmental Disorders study (N = 1431). SHEPHERD demonstrates the potential of knowledge-grounded deep learning to accelerate rare disease diagnosis.
有超过7000种罕见疾病,其中一些在美国影响的患者人数为3500人或更少。由于临床医生对这些疾病的经验有限,且临床表现存在异质性,约70%寻求诊断的个体仍未得到诊断。深度学习已在辅助常见疾病诊断方面取得成功。然而,现有方法需要每个疾病有数千名已确诊患者的标记数据集。我们提出了SHEPHERD,一种用于多方面罕见疾病诊断的少样本学习方法。SHEPHERD在富含罕见疾病信息的知识图谱上进行深度学习,并在模拟罕见疾病患者的数据集上进行训练。我们使用来自未确诊疾病网络(N = 465)、MyGene2(N = 146)和发育障碍解读研究(N = 1431)的真实队列,展示了SHEPHERD在各种诊断任务中的有效性,包括因果基因发现、检索“像我这样的患者”以及表征新的疾病表现。SHEPHERD证明了基于知识的深度学习在加速罕见疾病诊断方面的潜力。