pBRIT：通过整合数据融合来关联功能和表型注释进行基因优先级排序。

pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion.

机构信息

Center of Medical Genetics, University of Antwerp and Antwerp University Hospital, Antwerp, Belgium.

Biomedical Informatics Research Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium.

出版信息

Bioinformatics. 2018 Jul 1;34(13):2254-2262. doi: 10.1093/bioinformatics/bty079.

DOI:10.1093/bioinformatics/bty079

PMID:29452392

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6022555/

Abstract

MOTIVATION

Computational gene prioritization can aid in disease gene identification. Here, we propose pBRIT (prioritization using Bayesian Ridge regression and Information Theoretic model), a novel adaptive and scalable prioritization tool, integrating Pubmed abstracts, Gene Ontology, Sequence similarities, Mammalian and Human Phenotype Ontology, Pathway, Interactions, Disease Ontology, Gene Association database and Human Genome Epidemiology database, into the prediction model. We explore and address effects of sparsity and inter-feature dependencies within annotation sources, and the impact of bias towards specific annotations.

RESULTS

pBRIT models feature dependencies and sparsity by an Information-Theoretic (data driven) approach and applies intermediate integration based data fusion. Following the hypothesis that genes underlying similar diseases will share functional and phenotype characteristics, it incorporates Bayesian Ridge regression to learn a linear mapping between functional and phenotype annotations. Genes are prioritized on phenotypic concordance to the training genes. We evaluated pBRIT against nine existing methods, and on over 2000 HPO-gene associations retrieved after construction of pBRIT data sources. We achieve maximum AUC scores ranging from 0.92 to 0.96 against benchmark datasets and of 0.80 against the time-stamped HPO entries, indicating good performance with high sensitivity and specificity. Our model shows stable performance with regard to changes in the underlying annotation data, is fast and scalable for implementation in routine pipelines.

AVAILABILITY AND IMPLEMENTATION

http://biomina.be/apps/pbrit/; https://bitbucket.org/medgenua/pbrit.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

计算基因优先级可以辅助疾病基因的识别。在这里，我们提出了 pBRIT（基于贝叶斯 Ridge 回归和信息论模型的优先级排序），这是一种新颖的自适应和可扩展的优先级排序工具，将 PubMed 摘要、基因本体、序列相似性、哺乳动物和人类表型本体、途径、相互作用、疾病本体、基因关联数据库和人类基因组流行病学数据库集成到预测模型中。我们探索并解决了注释来源内稀疏性和特征依赖性的影响，以及对特定注释的偏向的影响。

结果

pBRIT 通过信息论（数据驱动）方法对特征依赖性和稀疏性进行建模，并应用基于中间整合的数据融合。基于这样的假设，即具有相似疾病的基因将共享功能和表型特征，它将贝叶斯 Ridge 回归纳入其中，以学习功能和表型注释之间的线性映射。根据与训练基因在表型上的一致性对基因进行优先级排序。我们将 pBRIT 与九种现有方法进行了评估，并在构建 pBRIT 数据源后检索到的 2000 多个 HPO-基因关联中进行了评估。我们针对基准数据集获得了从 0.92 到 0.96 的最大 AUC 分数，针对时间戳 HPO 条目获得了 0.80 的 AUC 分数，表明具有高灵敏度和特异性的良好性能。我们的模型在底层注释数据发生变化时表现出稳定的性能，快速且可扩展，适用于常规管道的实施。

可用性和实现

http://biomina.be/apps/pbrit/; https://bitbucket.org/medgenua/pbrit。

补充信息

补充数据可在 Bioinformatics 在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7c8/6022555/289d8bf67abf/bty079f1.jpg

相似文献

pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion.

Bioinformatics. 2018 Jul 1;34(13):2254-2262. doi: 10.1093/bioinformatics/bty079.

Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information.

Bioinformatics. 2018 Jul 1;34(13):i447-i456. doi: 10.1093/bioinformatics/bty289.

OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization.

Bioinformatics. 2015 Dec 1;31(23):3822-9. doi: 10.1093/bioinformatics/btv473. Epub 2015 Aug 12.

The Human Phenotype Ontology in 2017.

Nucleic Acids Res. 2017 Jan 4;45(D1):D865-D876. doi: 10.1093/nar/gkw1039. Epub 2016 Nov 28.

Transfer learning across ontologies for phenome-genome association prediction.

Bioinformatics. 2017 Feb 15;33(4):529-536. doi: 10.1093/bioinformatics/btw649.

InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk.

BMC Genomics. 2018 Jan 19;19(Suppl 1):919. doi: 10.1186/s12864-017-4338-6.

Information-theoretic evaluation of predicted ontological annotations.

Bioinformatics. 2013 Jul 1;29(13):i53-61. doi: 10.1093/bioinformatics/btt228.

MCO: towards an ontology and unified vocabulary for a framework-based annotation of microbial growth conditions.

Bioinformatics. 2019 Mar 1;35(5):856-864. doi: 10.1093/bioinformatics/bty689.

The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data.

Nucleic Acids Res. 2014 Jan;42(Database issue):D966-74. doi: 10.1093/nar/gkt1026. Epub 2013 Nov 11.

Cross-organism learning method to discover new gene functionalities.

Comput Methods Programs Biomed. 2016 Apr;126:20-34. doi: 10.1016/j.cmpb.2015.12.002. Epub 2015 Dec 17.

引用本文的文献

Proteomizer: Leveraging the Transcriptome-Proteome Mismatch to Infer Novel Gene Regulatory Relations.

bioRxiv. 2025 Jun 27:2025.06.22.660946. doi: 10.1101/2025.06.22.660946.

Single-cell data combined with phenotypes improves variant interpretation.

BMC Genomics. 2025 May 28;26(1):540. doi: 10.1186/s12864-025-11711-w.

Tissue-aware interpretation of genetic variants advances the etiology of rare diseases.

Mol Syst Biol. 2024 Nov;20(11):1187-1206. doi: 10.1038/s44320-024-00061-6. Epub 2024 Sep 16.

Gollop-Wolfgang Complex Is Associated with a Monoallelic Variation in .

Genes (Basel). 2024 Jan 20;15(1):129. doi: 10.3390/genes15010129.

Simulation of undiagnosed patients with novel genetic conditions.

Nat Commun. 2023 Oct 12;14(1):6403. doi: 10.1038/s41467-023-41980-6.

DeepGenePrior: A deep learning model for prioritizing genes affected by copy number variants.

PLoS Comput Biol. 2023 Jul 24;19(7):e1011249. doi: 10.1371/journal.pcbi.1011249. eCollection 2023 Jul.

Predicting molecular mechanisms of hereditary diseases by using their tissue-selective manifestation.

Mol Syst Biol. 2023 Aug 8;19(8):e11407. doi: 10.15252/msb.202211407. Epub 2023 May 26.

Angiogenesis goes computational - The future way forward to discover new angiogenic targets?

Comput Struct Biotechnol J. 2022 Sep 13;20:5235-5255. doi: 10.1016/j.csbj.2022.09.019. eCollection 2022.

Network-Based Approaches for Disease-Gene Association Prediction Using Protein-Protein Interaction Networks.

Int J Mol Sci. 2022 Jul 3;23(13):7411. doi: 10.3390/ijms23137411.

Prioritizing Suggestive Candidate Genes in Migraine: An Opinion.

Front Neurol. 2022 Jun 15;13:910366. doi: 10.3389/fneur.2022.910366. eCollection 2022.

本文引用的文献

Random walk with restart on multiplex and heterogeneous biological networks.

Bioinformatics. 2019 Feb 1;35(3):497-505. doi: 10.1093/bioinformatics/bty637.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.

Impact of outdated gene annotations on pathway enrichment analysis.

Nat Methods. 2016 Aug 30;13(9):705-6. doi: 10.1038/nmeth.3963.

Analysis of protein-coding genetic variation in 60,706 humans.

Nature. 2016 Aug 18;536(7616):285-91. doi: 10.1038/nature19057.

Candidate gene prioritization with Endeavour.

Nucleic Acids Res. 2016 Jul 8;44(W1):W117-21. doi: 10.1093/nar/gkw365. Epub 2016 Apr 30.

Gene Prioritization by Compressive Data Fusion and Chaining.

PLoS Comput Biol. 2015 Oct 14;11(10):e1004552. doi: 10.1371/journal.pcbi.1004552. eCollection 2015 Oct.

A fast and high performance multiple data integration algorithm for identifying human disease genes.

BMC Med Genomics. 2015;8 Suppl 3(Suppl 3):S2. doi: 10.1186/1755-8794-8-S3-S2. Epub 2015 Sep 23.

DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes.

Database (Oxford). 2015 Apr 15;2015:bav028. doi: 10.1093/database/bav028. Print 2015.

De novo loss- or gain-of-function mutations in KCNA2 cause epileptic encephalopathy.

Nat Genet. 2015 Apr;47(4):393-399. doi: 10.1038/ng.3239. Epub 2015 Mar 9.

HyDRA: gene prioritization via hybrid distance-score rank aggregation.

Bioinformatics. 2015 Apr 1;31(7):1034-43. doi: 10.1093/bioinformatics/btu766. Epub 2014 Nov 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

pBRIT：通过整合数据融合来关联功能和表型注释进行基因优先级排序。

pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion.

机构信息

Center of Medical Genetics, University of Antwerp and Antwerp University Hospital, Antwerp, Belgium.

Biomedical Informatics Research Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium.