School of Computer Science and Technology, Harbin Institute of Technology Harbin, Harbin, Heilongjiang, China.
Cumming School of Medicine, University of Calgary, Calgary, Canada.
BMC Bioinformatics. 2024 Mar 6;25(Suppl 1):100. doi: 10.1186/s12859-024-05709-6.
In the past decade, single nucleotide variants (SNVs) have been identified as having a significant relationship with the development and treatment of diseases. Among them, prioritizing missense variants for further functional impact investigation is an essential challenge in the study of common disease and cancer. Although several computational methods have been developed to predict the functional impacts of variants, the predictive ability of these methods is still insufficient in the Mendelian and cancer missense variants.
We present a novel prediction method called the disease-related variant annotation (DVA) method that predicts the effect of missense variants based on a comprehensive feature set of variants, notably, the allele frequency and protein-protein interaction network feature based on graph embedding. Benchmarked against datasets of single nucleotide missense variants, the DVA method outperforms the state-of-the-art methods by up to 0.473 in the area under receiver operating characteristic curve. The results demonstrate that the proposed method can accurately predict the functional impact of single nucleotide missense variants and substantially outperforms existing methods.
DVA is an effective framework for identifying the functional impact of disease missense variants based on a comprehensive feature set. Based on different datasets, DVA shows its generalization ability and robustness, and it also provides innovative ideas for the study of the functional mechanism and impact of SNVs.
在过去的十年中,单核苷酸变异(SNVs)已被确定与疾病的发生和治疗有显著关系。其中,优先对错义变异进行进一步的功能影响研究是常见疾病和癌症研究中的一个重要挑战。尽管已经开发了几种计算方法来预测变异的功能影响,但这些方法在孟德尔和癌症错义变异中的预测能力仍然不足。
我们提出了一种名为疾病相关变异注释(DVA)的新预测方法,该方法基于变异的综合特征集来预测错义变异的效应,特别是基于图嵌入的等位基因频率和蛋白质-蛋白质相互作用网络特征。与单核苷酸错义变异数据集进行基准测试,DVA 方法在接收器操作特征曲线下面积方面的表现优于最先进的方法,最高可达 0.473。结果表明,所提出的方法可以准确预测单核苷酸错义变异的功能影响,并且大大优于现有方法。
DVA 是一种基于综合特征集识别疾病错义变异功能影响的有效框架。基于不同的数据集,DVA 展示了其泛化能力和鲁棒性,为 SNVs 的功能机制和影响研究提供了创新思路。