• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GA4GH 表型数据包语料库:用于基因组诊断和发现的病例级表型分析。

A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery.

作者信息

Danis Daniel, Bamshad Michael J, Bridges Yasemin, Cacheiro Pilar, Carmody Leigh C, Chong Jessica X, Coleman Ben, Dalgleish Raymond, Freeman Peter J, Graefe Adam S L, Groza Tudor, Jacobsen Julius O B, Klocperk Adam, Kusters Maaike, Ladewig Markus S, Marcello Anthony J, Mattina Teresa, Mungall Christopher J, Munoz-Torres Monica C, Reese Justin T, Rehburg Filip, Reis Bárbara C S, Schuetz Catharina, Smedley Damian, Strauss Timmy, Sundaramurthi Jagadish Chandrabose, Thun Sylvia, Wissink Kyran, Wagstaff John F, Zocche David, Haendel Melissa A, Robinson Peter N

机构信息

The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA.

Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.

出版信息

medRxiv. 2024 May 29:2024.05.29.24308104. doi: 10.1101/2024.05.29.24308104.

DOI:10.1101/2024.05.29.24308104
PMID:38854034
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11160806/
Abstract

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

摘要

全球基因组与健康联盟(GA4GH)表型数据包模式于2022年发布,并被国际标准化组织(ISO)批准为用于共享个体临床和基因组信息的标准,包括表型描述、数值测量、遗传信息、诊断和治疗。一个表型数据包可以用作支持表型驱动基因组诊断的软件以及促进患者分类和分层以识别新疾病和治疗方法的算法的输入文件。非常需要一组表型数据包来测试软件管道和算法。在此,我们展示了表型数据包存储库。表型数据包存储库的0.1.12版本包含4916个表型数据包,代表与236个基因相关的277种孟德尔疾病和染色体疾病,以及从605篇不同出版物中整理出的2872个独特的致病等位基因。这代表了首个大规模的、源自文献中病例报告的病例级标准化表型信息集合,其中包含临床数据的详细描述,并且将用于许多目的,包括开发和测试用于在诊断基因组学中对基因和疾病进行优先级排序的软件、临床表型数据的机器学习分析、患者分层以及基因型 - 表型相关性分析。这个语料库还为使用GA4GH表型数据包模式整理源自文献的数据提供了最佳实践示例。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4bc4/11160806/5eac1f896383/nihpp-2024.05.29.24308104v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4bc4/11160806/70362494cff7/nihpp-2024.05.29.24308104v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4bc4/11160806/5eac1f896383/nihpp-2024.05.29.24308104v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4bc4/11160806/70362494cff7/nihpp-2024.05.29.24308104v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4bc4/11160806/5eac1f896383/nihpp-2024.05.29.24308104v1-f0002.jpg

相似文献

1
A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery.GA4GH 表型数据包语料库:用于基因组诊断和发现的病例级表型分析。
medRxiv. 2024 May 29:2024.05.29.24308104. doi: 10.1101/2024.05.29.24308104.
2
A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery.GA4GH 表型数据包语料库:用于基因组诊断和发现的病例级表型分析。
HGG Adv. 2025 Jan 9;6(1):100371. doi: 10.1016/j.xhgg.2024.100371. Epub 2024 Oct 10.
3
Phenopacket-tools: Building and validating GA4GH Phenopackets.Phenopacket-tools:构建和验证 GA4GH Phenopackets。
PLoS One. 2023 May 17;18(5):e0285433. doi: 10.1371/journal.pone.0285433. eCollection 2023.
4
GA4GH Phenopackets: A Practical Introduction.全球基因组与健康联盟(GA4GH)表型数据包:实用指南。
Adv Genet (Hoboken). 2022 Aug 25;4(1):2200016. doi: 10.1002/ggn2.202200016. eCollection 2023 Mar.
5
Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond.Pheno-Ranker:用于比较存储在GA4GH标准及其他标准中的表型数据的工具包。
BMC Bioinformatics. 2024 Dec 4;25(1):373. doi: 10.1186/s12859-024-05993-2.
6
Converting OMOP CDM to phenopackets: A model alignment and patient data representation evaluation.将 OMOP CDM 转换为 phenopackets:模型对齐和患者数据表示评估。
J Biomed Inform. 2024 Jul;155:104659. doi: 10.1016/j.jbi.2024.104659. Epub 2024 May 21.
7
GA4GH Phenopacket-Driven Characterization of Genotype-Phenotype Correlations in Mendelian Disorders.GA4GH孟德尔疾病中基因型-表型相关性的表型数据包驱动特征分析
medRxiv. 2025 Mar 6:2025.03.05.25323315. doi: 10.1101/2025.03.05.25323315.
8
The Human Phenotype Ontology in 2024: phenotypes around the world.2024 年人类表型本体:世界各地的表型。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1333-D1346. doi: 10.1093/nar/gkad1005.
9
Beacon v2 and Beacon networks: A "lingua franca" for federated data discovery in biomedical genomics, and beyond.信标v2与信标网络:生物医学基因组学及其他领域中联邦数据发现的“通用语言”
Hum Mutat. 2022 Jun;43(6):791-799. doi: 10.1002/humu.24369. Epub 2022 Apr 8.
10
Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases.Phen2Gene:针对罕见病的快速表型驱动基因优先级排序
NAR Genom Bioinform. 2020 Jun;2(2):lqaa032. doi: 10.1093/nargab/lqaa032. Epub 2020 May 25.

本文引用的文献

1
The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species.2024 年的“君主计划”:一个整合跨物种表型、基因和疾病的分析平台。
Nucleic Acids Res. 2024 Jan 5;52(D1):D938-D949. doi: 10.1093/nar/gkad1082.
2
The Human Phenotype Ontology in 2024: phenotypes around the world.2024 年人类表型本体:世界各地的表型。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1333-D1346. doi: 10.1093/nar/gkad1005.
3
Phenopacket-tools: Building and validating GA4GH Phenopackets.Phenopacket-tools:构建和验证 GA4GH Phenopackets。
PLoS One. 2023 May 17;18(5):e0285433. doi: 10.1371/journal.pone.0285433. eCollection 2023.
4
Enriching representation learning using 53 million patient notes through human phenotype ontology embedding.通过人类表型本体嵌入使用 5300 万患者笔记来丰富表示学习。
Artif Intell Med. 2023 May;139:102523. doi: 10.1016/j.artmed.2023.102523. Epub 2023 Feb 28.
5
GA4GH Phenopackets: A Practical Introduction.全球基因组与健康联盟(GA4GH)表型数据包:实用指南。
Adv Genet (Hoboken). 2022 Aug 25;4(1):2200016. doi: 10.1002/ggn2.202200016. eCollection 2023 Mar.
6
Development and application of a computable genotype model in the GA4GH Variation Representation Specification.GA4GH 变异表示规范中可计算基因型模型的开发与应用。
Pac Symp Biocomput. 2023;28:383-394.
7
PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care.PheNominal:一个在医疗护理点进行结构化深度表型分析的 EHR 集成型 Web 应用程序。
BMC Med Inform Decis Mak. 2022 Jul 28;22(Suppl 2):198. doi: 10.1186/s12911-022-01927-1.
8
The GA4GH Phenopacket schema defines a computable representation of clinical data.全球基因组与健康联盟(GA4GH)表型数据包模式定义了临床数据的可计算表示形式。
Nat Biotechnol. 2022 Jun;40(6):817-820. doi: 10.1038/s41587-022-01357-4.
9
SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing.SvAnna:长读长测序中编码和调控结构变异的高效准确致病性预测。
Genome Med. 2022 Apr 28;14(1):44. doi: 10.1186/s13073-022-01046-6.
10
Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes.儿童基因组学解答:对 1000 多个儿科罕见病基因组的动态分析。
Genet Med. 2022 Jun;24(6):1336-1348. doi: 10.1016/j.gim.2022.02.007. Epub 2022 Mar 16.