Danis Daniel, Bamshad Michael J, Bridges Yasemin, Caballero-Oteyza Andrés, Cacheiro Pilar, Carmody Leigh C, Chimirri Leonardo, Chong Jessica X, Coleman Ben, Dalgleish Raymond, Freeman Peter J, Graefe Adam S L, Groza Tudor, Hansen Peter, Jacobsen Julius O B, Klocperk Adam, Kusters Maaike, Ladewig Markus S, Marcello Allison J, Mattina Teresa, Mungall Christopher J, Munoz-Torres Monica C, Reese Justin T, Rehburg Filip, Reis Bárbara C S, Schuetz Catharina, Smedley Damian, Strauss Timmy, Sundaramurthi Jagadish Chandrabose, Thun Sylvia, Wissink Kyran, Wagstaff John F, Zocche David, Haendel Melissa A, Robinson Peter N
Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany; The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA.
Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA; Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98195, USA.
HGG Adv. 2025 Jan 9;6(1):100371. doi: 10.1016/j.xhgg.2024.100371. Epub 2024 Oct 10.
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Phenopacket Store v.0.1.19 includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3,834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
全球基因组与健康联盟(GA4GH)表型包模式于2022年发布,并被国际标准化组织(ISO)批准为共享个人临床和基因组信息的标准,包括表型描述、数值测量、遗传信息、诊断和治疗。一个表型包可以用作支持表型驱动的基因组诊断软件以及促进患者分类和分层以识别新疾病和治疗方法的算法的输入文件。非常需要一组表型包来测试软件管道和算法。在此,我们展示了表型包存储库。表型包存储库v.0.1.19包含6668个表型包,代表了与423个基因和3834个独特致病等位基因相关的475种孟德尔疾病和染色体疾病,这些数据是从959篇不同出版物中整理而来的。这是首个大规模的、源自文献病例报告的病例级标准化表型信息集合,其中包含临床数据的详细描述,将用于多种目的,包括开发和测试诊断基因组学中用于基因和疾病优先级排序的软件、临床表型数据的机器学习分析、患者分层以及基因型-表型相关性研究。该语料库还提供了使用GA4GH表型包模式整理文献衍生数据的最佳实践示例。