Danis Daniel, Bamshad Michael J, Bridges Yasemin, Cacheiro Pilar, Carmody Leigh C, Chong Jessica X, Coleman Ben, Dalgleish Raymond, Freeman Peter J, Graefe Adam S L, Groza Tudor, Jacobsen Julius O B, Klocperk Adam, Kusters Maaike, Ladewig Markus S, Marcello Anthony J, Mattina Teresa, Mungall Christopher J, Munoz-Torres Monica C, Reese Justin T, Rehburg Filip, Reis Bárbara C S, Schuetz Catharina, Smedley Damian, Strauss Timmy, Sundaramurthi Jagadish Chandrabose, Thun Sylvia, Wissink Kyran, Wagstaff John F, Zocche David, Haendel Melissa A, Robinson Peter N
The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA.
Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.
medRxiv. 2024 May 29:2024.05.29.24308104. doi: 10.1101/2024.05.29.24308104.
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
全球基因组与健康联盟(GA4GH)表型数据包模式于2022年发布,并被国际标准化组织(ISO)批准为用于共享个体临床和基因组信息的标准,包括表型描述、数值测量、遗传信息、诊断和治疗。一个表型数据包可以用作支持表型驱动基因组诊断的软件以及促进患者分类和分层以识别新疾病和治疗方法的算法的输入文件。非常需要一组表型数据包来测试软件管道和算法。在此,我们展示了表型数据包存储库。表型数据包存储库的0.1.12版本包含4916个表型数据包,代表与236个基因相关的277种孟德尔疾病和染色体疾病,以及从605篇不同出版物中整理出的2872个独特的致病等位基因。这代表了首个大规模的、源自文献中病例报告的病例级标准化表型信息集合,其中包含临床数据的详细描述,并且将用于许多目的,包括开发和测试用于在诊断基因组学中对基因和疾病进行优先级排序的软件、临床表型数据的机器学习分析、患者分层以及基因型 - 表型相关性分析。这个语料库还为使用GA4GH表型数据包模式整理源自文献的数据提供了最佳实践示例。