Zhang Haicang, Xu Michelle S, Fan Xiao, Chung Wendy K, Shen Yufeng
Department of Systems Biology, Columbia University, New York, NY, USA.
Columbia College, Columbia University, New York, USA.
Nat Mach Intell. 2022 Nov;4(11):1017-1028. doi: 10.1038/s42256-022-00561-w. Epub 2022 Nov 15.
Accurate prediction of damaging missense variants is critically important for interpreting a genome sequence. Although many methods have been developed, their performance has been limited. Recent advances in machine learning and the availability of large-scale population genomic sequencing data provide new opportunities to considerably improve computational predictions. Here we describe the graphical missense variant pathogenicity predictor (gMVP), a new method based on graph attention neural networks. Its main component is a graph with nodes that capture predictive features of amino acids and edges weighted by co-evolution strength, enabling effective pooling of information from the local protein context and functionally correlated distal positions. Evaluation of deep mutational scan data shows that gMVP outperforms other published methods in identifying damaging variants in , , and . Furthermore, it achieves the best separation of de novo missense variants in neuro developmental disorder cases from those in controls. Finally, the model supports transfer learning to optimize gain- and loss-of-function predictions in sodium and calcium channels. In summary, we demonstrate that gMVP can improve interpretation of missense variants in clinical testing and genetic studies.
准确预测有害的错义变体对于解读基因组序列至关重要。尽管已经开发了许多方法,但其性能一直有限。机器学习的最新进展以及大规模群体基因组测序数据的可用性为大幅改进计算预测提供了新机会。在此,我们描述了图形错义变体致病性预测器(gMVP),这是一种基于图注意力神经网络的新方法。其主要组件是一个图,节点捕获氨基酸的预测特征,边由共进化强度加权,能够有效地汇集来自局部蛋白质上下文和功能相关远端位置的信息。对深度突变扫描数据的评估表明,gMVP在识别[具体基因或数据集1]、[具体基因或数据集2]、[具体基因或数据集3]和[具体基因或数据集4]中的有害变体方面优于其他已发表的方法。此外,它在将神经发育障碍病例中的新生错义变体与对照中的变体进行最佳分离方面表现出色。最后,该模型支持迁移学习,以优化钠通道和钙通道中功能获得和功能丧失的预测。总之,我们证明gMVP可以改善临床检测和遗传研究中错义变体的解读。