基于网络的脑小血管病基因优先级排序方法的基准测试

Benchmarking network-based gene prioritization methods for cerebral small vessel disease.

作者信息

Zhang Huayu, Ferguson Amy, Robertson Grant, Jiang Muchen, Zhang Teng, Sudlow Cathie, Smith Keith, Rannikmae Kristiina, Wu Honghan

机构信息

Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom.

Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab006.

Abstract

Network-based gene prioritization algorithms are designed to prioritize disease-associated genes based on known ones using biological networks of protein interactions, gene-disease associations (GDAs) and other relationships between biological entities. Various algorithms have been developed based on different mechanisms, but it is not obvious which algorithm is optimal for a specific disease. To address this issue, we benchmarked multiple algorithms for their application in cerebral small vessel disease (cSVD). We curated protein-gene interactions (PGIs) and GDAs from databases and assembled PGI networks and disease-gene heterogeneous networks. A screening of algorithms resulted in seven representative algorithms to be benchmarked. Performance of algorithms was assessed using both leave-one-out cross-validation (LOOCV) and external validation with MEGASTROKE genome-wide association study (GWAS). We found that random walk with restart on the heterogeneous network (RWRH) showed best LOOCV performance, with median LOOCV rediscovery rank of 185.5 (out of 19 463 genes). The GenePanda algorithm had most GWAS-confirmable genes in top 200 predictions, while RWRH had best ranks for small vessel stroke-associated genes confirmed in GWAS. In conclusion, RWRH has overall better performance for application in cSVD despite its susceptibility to bias caused by degree centrality. Choice of algorithms should be determined before applying to specific disease. Current pure network-based gene prioritization algorithms are unlikely to find novel disease-associated genes that are not associated with known ones. The tools for implementing and benchmarking algorithms have been made available and can be generalized for other diseases.

摘要

基于网络的基因优先级排序算法旨在利用蛋白质相互作用、基因-疾病关联(GDA)以及生物实体之间的其他关系等生物网络,根据已知的疾病相关基因对疾病相关基因进行优先级排序。基于不同机制已开发出多种算法,但对于特定疾病而言,哪种算法最优并不明显。为解决这一问题,我们对多种算法在脑小血管病(cSVD)中的应用进行了基准测试。我们从数据库中精心挑选了蛋白质-基因相互作用(PGI)和GDA,并构建了PGI网络和疾病-基因异质网络。对算法进行筛选后得到七种具有代表性的算法用于基准测试。使用留一法交叉验证(LOOCV)和MEGASTROKE全基因组关联研究(GWAS)的外部验证来评估算法的性能。我们发现,在异质网络上带重启的随机游走(RWRH)显示出最佳的LOOCV性能,LOOCV重新发现排名中位数为185.5(在19463个基因中)。在200个预测结果中,GenePanda算法具有最多的GWAS可确认基因,而RWRH在GWAS中确认的小血管中风相关基因方面排名最佳。总之,尽管RWRH易受度中心性偏差的影响,但在cSVD应用中总体性能更好。在应用于特定疾病之前应确定算法的选择。当前基于纯网络的基因优先级排序算法不太可能找到与已知基因无关的新型疾病相关基因。已提供了用于实现和基准测试算法的工具,并且可以推广到其他疾病。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c874/8425308/e1449d7260d0/bbab006f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索