• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Pheno-Ranker:用于比较存储在GA4GH标准及其他标准中的表型数据的工具包。

Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond.

作者信息

Leist Ivo C, Rivas-Torrubia María, Alarcón-Riquelme Marta E, Barturen Guillermo, Consortium Precisesads Clinical, Gut Ivo G, Rueda Manuel

机构信息

Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain.

Universitat de Barcelona (UB), Barcelona, Spain.

出版信息

BMC Bioinformatics. 2024 Dec 4;25(1):373. doi: 10.1186/s12859-024-05993-2.

DOI:10.1186/s12859-024-05993-2
PMID:39633268
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11616229/
Abstract

BACKGROUND

Phenotypic data comparison is essential for disease association studies, patient stratification, and genotype-phenotype correlation analysis. To support these efforts, the Global Alliance for Genomics and Health (GA4GH) established Phenopackets v2 and Beacon v2 standards for storing, sharing, and discovering genomic and phenotypic data. These standards provide a consistent framework for organizing biological data, simplifying their transformation into computer-friendly formats. However, matching participants using GA4GH-based formats remains challenging, as current methods are not fully compatible, limiting their effectiveness.

RESULTS

Here, we introduce Pheno-Ranker, an open-source software toolkit for individual-level comparison of phenotypic data. As input, it accepts JSON/YAML data exchange formats from Beacon v2 and Phenopackets v2 data models, as well as any data structure encoded in JSON, YAML, or CSV formats. Internally, the hierarchical data structure is flattened to one dimension and then transformed through one-hot encoding. This allows for efficient pairwise (all-to-all) comparisons within cohorts or for matching of a patient's profile in cohorts. Users have the flexibility to refine their comparisons by including or excluding terms, applying weights to variables, and obtaining statistical significance through Z-scores and p-values. The output consists of text files, which can be further analyzed using unsupervised learning techniques, such as clustering or multidimensional scaling (MDS), and with graph analytics. Pheno-Ranker's performance has been validated with simulated and synthetic data, showing its accuracy, robustness, and efficiency across various health data scenarios. A real data use case from the PRECISESADS study highlights its practical utility in clinical research.

CONCLUSIONS

Pheno-Ranker is a user-friendly, lightweight software for semantic similarity analysis of phenotypic data in Beacon v2 and Phenopackets v2 formats, extendable to other data types. It enables the comparison of a wide range of variables beyond HPO or OMIM terms while preserving full context. The software is designed as a command-line tool with additional utilities for CSV import, data simulation, summary statistics plotting, and QR code generation. For interactive analysis, it also includes a web-based user interface built with R Shiny. Links to the online documentation, including a Google Colab tutorial, and the tool's source code are available on the project home page: https://github.com/CNAG-Biomedical-Informatics/pheno-ranker .

摘要

背景

表型数据比较对于疾病关联研究、患者分层以及基因型-表型相关性分析至关重要。为支持这些工作,全球基因组与健康联盟(GA4GH)制定了Phenopackets v2和Beacon v2标准,用于存储、共享和发现基因组及表型数据。这些标准为组织生物数据提供了一个一致的框架,简化了将其转换为计算机友好格式的过程。然而,使用基于GA4GH的格式匹配参与者仍然具有挑战性,因为当前方法并不完全兼容,限制了它们的有效性。

结果

在此,我们介绍Pheno-Ranker,这是一个用于表型数据个体水平比较的开源软件工具包。作为输入,它接受来自Beacon v2和Phenopackets v2数据模型的JSON/YAML数据交换格式,以及以JSON、YAML或CSV格式编码的任何数据结构。在内部,分层数据结构被扁平化为一维,然后通过独热编码进行转换。这允许在队列中进行高效的成对(全对全)比较,或在队列中匹配患者的概况。用户可以灵活地通过包含或排除术语、对变量应用权重以及通过Z分数和p值获得统计显著性来细化他们的比较。输出由文本文件组成,可以使用无监督学习技术(如聚类或多维缩放(MDS))以及图分析进行进一步分析。Pheno-Ranker的性能已通过模拟和合成数据进行验证,在各种健康数据场景中展示了其准确性、稳健性和效率。PRECISESADS研究的一个实际数据用例突出了其在临床研究中的实际效用。

结论

Pheno-Ranker是一款用户友好、轻量级的软件,用于对Beacon v2和Phenopackets v2格式的表型数据进行语义相似性分析,可扩展到其他数据类型。它能够在保留完整上下文的同时,比较除人类表型本体(HPO)或医学主题词表(OMIM)术语之外的广泛变量。该软件被设计为一个命令行工具,并带有用于CSV导入、数据模拟、汇总统计绘图和二维码生成的附加实用程序。对于交互式分析,它还包括一个基于R Shiny构建的基于网络的用户界面。项目主页(https://github.com/CNAG-Biomedical-Informatics/pheno-ranker )上提供了在线文档的链接,包括一个Google Colab教程以及该工具的源代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/3c0d38ee0ca4/12859_2024_5993_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/bbd6a589c97f/12859_2024_5993_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/319f399f77a7/12859_2024_5993_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/20088fd3df59/12859_2024_5993_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/5c2ca902b51b/12859_2024_5993_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/3c0d38ee0ca4/12859_2024_5993_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/bbd6a589c97f/12859_2024_5993_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/319f399f77a7/12859_2024_5993_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/20088fd3df59/12859_2024_5993_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/5c2ca902b51b/12859_2024_5993_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/3c0d38ee0ca4/12859_2024_5993_Fig5_HTML.jpg

相似文献

1
Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond.Pheno-Ranker:用于比较存储在GA4GH标准及其他标准中的表型数据的工具包。
BMC Bioinformatics. 2024 Dec 4;25(1):373. doi: 10.1186/s12859-024-05993-2.
2
Phenopacket-tools: Building and validating GA4GH Phenopackets.Phenopacket-tools:构建和验证 GA4GH Phenopackets。
PLoS One. 2023 May 17;18(5):e0285433. doi: 10.1371/journal.pone.0285433. eCollection 2023.
3
Beacon v2 Reference Implementation: a toolkit to enable federated sharing of genomic and phenotypic data.Beacon v2 参考实现:一个用于实现基因组和表型数据联合共享的工具包。
Bioinformatics. 2022 Sep 30;38(19):4656-4657. doi: 10.1093/bioinformatics/btac568.
4
A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery.GA4GH 表型数据包语料库:用于基因组诊断和发现的病例级表型分析。
HGG Adv. 2025 Jan 9;6(1):100371. doi: 10.1016/j.xhgg.2024.100371. Epub 2024 Oct 10.
5
Convert-Pheno: A software toolkit for the interconversion of standard data models for phenotypic data.Convert-Pheno:用于表型数据标准数据模型互转的软件工具包。
J Biomed Inform. 2024 Jan;149:104558. doi: 10.1016/j.jbi.2023.104558. Epub 2023 Nov 29.
6
Beacon v2 and Beacon networks: A "lingua franca" for federated data discovery in biomedical genomics, and beyond.信标v2与信标网络:生物医学基因组学及其他领域中联邦数据发现的“通用语言”
Hum Mutat. 2022 Jun;43(6):791-799. doi: 10.1002/humu.24369. Epub 2022 Apr 8.
7
A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery.GA4GH 表型数据包语料库:用于基因组诊断和发现的病例级表型分析。
medRxiv. 2024 May 29:2024.05.29.24308104. doi: 10.1101/2024.05.29.24308104.
8
GA4GH Phenopackets: A Practical Introduction.全球基因组与健康联盟(GA4GH)表型数据包:实用指南。
Adv Genet (Hoboken). 2022 Aug 25;4(1):2200016. doi: 10.1002/ggn2.202200016. eCollection 2023 Mar.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Variant Ranker: a web-tool to rank genomic data according to functional significance.变异排序器:一种根据功能重要性对基因组数据进行排序的网络工具。
BMC Bioinformatics. 2017 Jul 17;18(1):341. doi: 10.1186/s12859-017-1752-3.

引用本文的文献

1
: an R/bioconductor package for user-friendly access to the Beacon v2 API.一个用于方便用户访问Beacon v2 API的R/生物导体包。
Bioinform Adv. 2025 Jul 16;5(1):vbaf172. doi: 10.1093/bioadv/vbaf172. eCollection 2025.

本文引用的文献

1
A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery.GA4GH 表型数据包语料库:用于基因组诊断和发现的病例级表型分析。
HGG Adv. 2025 Jan 9;6(1):100371. doi: 10.1016/j.xhgg.2024.100371. Epub 2024 Oct 10.
2
The transition from genomics to phenomics in personalized population health.从个体化人群健康的基因组学到表型组学的转变。
Nat Rev Genet. 2024 Apr;25(4):286-302. doi: 10.1038/s41576-023-00674-x. Epub 2023 Dec 13.
3
The use of foundational ontologies in biomedical research.基础本体论在生物医学研究中的应用。
J Biomed Semantics. 2023 Dec 11;14(1):21. doi: 10.1186/s13326-023-00300-z.
4
Convert-Pheno: A software toolkit for the interconversion of standard data models for phenotypic data.Convert-Pheno:用于表型数据标准数据模型互转的软件工具包。
J Biomed Inform. 2024 Jan;149:104558. doi: 10.1016/j.jbi.2023.104558. Epub 2023 Nov 29.
5
Term-BLAST-like alignment tool for concept recognition in noisy clinical texts.用于嘈杂临床文本中概念识别的 Term-BLAST 样对齐工具。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad716.
6
The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species.2024 年的“君主计划”:一个整合跨物种表型、基因和疾病的分析平台。
Nucleic Acids Res. 2024 Jan 5;52(D1):D938-D949. doi: 10.1093/nar/gkad1082.
7
Phenotypic similarity-based approach for variant prioritization for unsolved rare disease: a preliminary methodological report.基于表型相似性的未解决罕见病变异优先级方法:初步方法学报告。
Eur J Hum Genet. 2024 Feb;32(2):182-189. doi: 10.1038/s41431-023-01486-7. Epub 2023 Nov 6.
8
Clustering rare diseases within an ontology-enriched knowledge graph.在本体丰富的知识图中对罕见病进行聚类。
J Am Med Inform Assoc. 2023 Dec 22;31(1):154-164. doi: 10.1093/jamia/ocad186.
9
diseaseGPS: auxiliary diagnostic system for genetic disorders based on genotype and phenotype.疾病 GPS:基于基因型和表型的遗传疾病辅助诊断系统。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad517.
10
cyjShiny: A cytoscape.js R Shiny Widget for network visualization and analysis.cyjShiny:一个 cytoscape.js R Shiny 小部件,用于网络可视化和分析。
PLoS One. 2023 Aug 16;18(8):e0285339. doi: 10.1371/journal.pone.0285339. eCollection 2023.