Suppr超能文献

Pheno-Ranker:用于比较存储在GA4GH标准及其他标准中的表型数据的工具包。

Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond.

作者信息

Leist Ivo C, Rivas-Torrubia María, Alarcón-Riquelme Marta E, Barturen Guillermo, Consortium Precisesads Clinical, Gut Ivo G, Rueda Manuel

机构信息

Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain.

Universitat de Barcelona (UB), Barcelona, Spain.

出版信息

BMC Bioinformatics. 2024 Dec 4;25(1):373. doi: 10.1186/s12859-024-05993-2.

Abstract

BACKGROUND

Phenotypic data comparison is essential for disease association studies, patient stratification, and genotype-phenotype correlation analysis. To support these efforts, the Global Alliance for Genomics and Health (GA4GH) established Phenopackets v2 and Beacon v2 standards for storing, sharing, and discovering genomic and phenotypic data. These standards provide a consistent framework for organizing biological data, simplifying their transformation into computer-friendly formats. However, matching participants using GA4GH-based formats remains challenging, as current methods are not fully compatible, limiting their effectiveness.

RESULTS

Here, we introduce Pheno-Ranker, an open-source software toolkit for individual-level comparison of phenotypic data. As input, it accepts JSON/YAML data exchange formats from Beacon v2 and Phenopackets v2 data models, as well as any data structure encoded in JSON, YAML, or CSV formats. Internally, the hierarchical data structure is flattened to one dimension and then transformed through one-hot encoding. This allows for efficient pairwise (all-to-all) comparisons within cohorts or for matching of a patient's profile in cohorts. Users have the flexibility to refine their comparisons by including or excluding terms, applying weights to variables, and obtaining statistical significance through Z-scores and p-values. The output consists of text files, which can be further analyzed using unsupervised learning techniques, such as clustering or multidimensional scaling (MDS), and with graph analytics. Pheno-Ranker's performance has been validated with simulated and synthetic data, showing its accuracy, robustness, and efficiency across various health data scenarios. A real data use case from the PRECISESADS study highlights its practical utility in clinical research.

CONCLUSIONS

Pheno-Ranker is a user-friendly, lightweight software for semantic similarity analysis of phenotypic data in Beacon v2 and Phenopackets v2 formats, extendable to other data types. It enables the comparison of a wide range of variables beyond HPO or OMIM terms while preserving full context. The software is designed as a command-line tool with additional utilities for CSV import, data simulation, summary statistics plotting, and QR code generation. For interactive analysis, it also includes a web-based user interface built with R Shiny. Links to the online documentation, including a Google Colab tutorial, and the tool's source code are available on the project home page: https://github.com/CNAG-Biomedical-Informatics/pheno-ranker .

摘要

背景

表型数据比较对于疾病关联研究、患者分层以及基因型-表型相关性分析至关重要。为支持这些工作,全球基因组与健康联盟(GA4GH)制定了Phenopackets v2和Beacon v2标准,用于存储、共享和发现基因组及表型数据。这些标准为组织生物数据提供了一个一致的框架,简化了将其转换为计算机友好格式的过程。然而,使用基于GA4GH的格式匹配参与者仍然具有挑战性,因为当前方法并不完全兼容,限制了它们的有效性。

结果

在此,我们介绍Pheno-Ranker,这是一个用于表型数据个体水平比较的开源软件工具包。作为输入,它接受来自Beacon v2和Phenopackets v2数据模型的JSON/YAML数据交换格式,以及以JSON、YAML或CSV格式编码的任何数据结构。在内部,分层数据结构被扁平化为一维,然后通过独热编码进行转换。这允许在队列中进行高效的成对(全对全)比较,或在队列中匹配患者的概况。用户可以灵活地通过包含或排除术语、对变量应用权重以及通过Z分数和p值获得统计显著性来细化他们的比较。输出由文本文件组成,可以使用无监督学习技术(如聚类或多维缩放(MDS))以及图分析进行进一步分析。Pheno-Ranker的性能已通过模拟和合成数据进行验证,在各种健康数据场景中展示了其准确性、稳健性和效率。PRECISESADS研究的一个实际数据用例突出了其在临床研究中的实际效用。

结论

Pheno-Ranker是一款用户友好、轻量级的软件,用于对Beacon v2和Phenopackets v2格式的表型数据进行语义相似性分析,可扩展到其他数据类型。它能够在保留完整上下文的同时,比较除人类表型本体(HPO)或医学主题词表(OMIM)术语之外的广泛变量。该软件被设计为一个命令行工具,并带有用于CSV导入、数据模拟、汇总统计绘图和二维码生成的附加实用程序。对于交互式分析,它还包括一个基于R Shiny构建的基于网络的用户界面。项目主页(https://github.com/CNAG-Biomedical-Informatics/pheno-ranker )上提供了在线文档的链接,包括一个Google Colab教程以及该工具的源代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dc3/11616229/bbd6a589c97f/12859_2024_5993_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验