School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215000, China; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215000, China.
J Mol Biol. 2019 Jun 14;431(13):2449-2459. doi: 10.1016/j.jmb.2019.02.017. Epub 2019 Feb 21.
Nearly one-third of non-synonymous single-nucleotide polymorphism (nsSNPs) are deleterious to human health, but recognition of the disease-associated mutations remains a significant unsolved problem. We proposed a new algorithm, DAMpred, to identify disease-causing nsSNPs through the coupling of evolutionary profiles with structure predictions of proteins and protein-protein interactions. The pipeline was trained by a novel Bayes-guided artificial neural network algorithm that incorporates posterior probabilities of distinct feature classifiers with the network training process. DAMpred was tested on a large-scale data set involving 10,635 nsSNPs from 2154 ORFs in the human genome and recognized disease-associated nsSNPs with an accuracy 0.80 and a Matthews correlation coefficient of 0.601, which is 9.1% higher than the best of other state-of-the-art methods. In the blind test on the TP53 gene, DAMpred correctly recognized the mutations causative of Li-Fraumeni-like syndrome with a Matthews correlation coefficient that is 27% higher than the control methods. The study demonstrates an efficient avenue to quantitatively model the association of nsSNPs with human diseases from low-resolution protein structure prediction, which should find important usefulness in diagnosis and treatment of genetic diseases.
近三分之一的非同义单核苷酸多态性(nsSNPs)对人类健康有害,但识别与疾病相关的突变仍然是一个未解决的重大问题。我们提出了一种新算法 DAMpred,通过将蛋白质进化轮廓与结构预测和蛋白质-蛋白质相互作用相结合,来识别致病的 nsSNPs。该流水线通过一种新的贝叶斯引导的人工神经网络算法进行训练,该算法将不同特征分类器的后验概率与网络训练过程相结合。我们在一个大规模数据集上对 DAMpred 进行了测试,该数据集包含来自人类基因组中 2154 个 ORF 的 10635 个 nsSNPs,并以 0.80 的准确率和 0.601 的 Matthews 相关系数识别出与疾病相关的 nsSNPs,比其他最先进方法中的最佳方法高出 9.1%。在对 TP53 基因的盲测中,DAMpred 正确识别出与 Li-Fraumeni 样综合征相关的突变,Matthews 相关系数比对照方法高 27%。该研究从低分辨率蛋白质结构预测的角度展示了一种定量模拟 nsSNPs 与人类疾病关联的有效途径,这在遗传疾病的诊断和治疗中应该具有重要的应用价值。