Song Jie, He Mengqiao, Ren Shumin, Shen Bairong
Department of Ophthalmology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China.
Sci Data. 2025 Apr 15;12(1):634. doi: 10.1038/s41597-025-04922-z.
Distinctive facial phenotypes serve as crucial diagnostic markers for many rare genetic diseases. Although AI-driven image recognition achieves high diagnostic accuracy, it often fails to explain its predictions. In this study, we present the Facial phenotype-Gene-Disease Dataset (FGDD), an explainable dataset collected from 509 research publications. It contains 1,147 data records encompassing 197 disease-causing genes, 437 facial phenotypes, and 211 disease entities, with 689 records having disease labels. Each data record represents a patient group and includes demographic information, variation information, and phenotype information. Baseline and explainability validations conducted on FGDD confirmed the dataset's effectiveness. FGDD supports the training of diagnostic models for rare genetic diseases while delivering explainable results, and provides a foundation for exploring intricate connections between genes, diseases, and facial phenotypes.
独特的面部表型是许多罕见遗传病的关键诊断标志物。尽管人工智能驱动的图像识别具有很高的诊断准确性,但它往往无法解释其预测结果。在本研究中,我们展示了面部表型-基因-疾病数据集(FGDD),这是一个从509篇研究文献中收集的可解释数据集。它包含1147条数据记录,涵盖197个致病基因、437种面部表型和211种疾病实体,其中689条记录有疾病标签。每条数据记录代表一个患者群体,包括人口统计学信息、变异信息和表型信息。对FGDD进行的基线和可解释性验证证实了该数据集的有效性。FGDD支持训练罕见遗传病的诊断模型,同时提供可解释的结果,并为探索基因、疾病和面部表型之间的复杂联系奠定了基础。