School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China.
School of Biomedical Engineering, Sun Yat-Sen University, Guangzhou, 510275, China.
Interdiscip Sci. 2024 Sep;16(3):649-664. doi: 10.1007/s12539-024-00615-0. Epub 2024 Mar 8.
As one of the most important post-translational modifications (PTMs), protein phosphorylation plays a key role in a variety of biological processes. Many studies have shown that protein phosphorylation is associated with various human diseases. Therefore, identifying protein phosphorylation site-disease associations can help to elucidate the pathogenesis of disease and discover new drug targets. Networks of sequence similarity and Gaussian interaction profile kernel similarity were constructed for phosphorylation sites, as well as networks of disease semantic similarity, disease symptom similarity and Gaussian interaction profile kernel similarity were constructed for diseases. To effectively combine different phosphorylation sites and disease similarity information, random walk with restart algorithm was used to obtain the topology information of the network. Then, the diffusion component analysis method was utilized to obtain the comprehensive phosphorylation site similarity and disease similarity. Meanwhile, the reliable negative samples were screened based on the Euclidean distance method. Finally, a convolutional neural network (CNN) model was constructed to identify potential associations between phosphorylation sites and diseases. Based on tenfold cross-validation, the evaluation indicators were obtained including accuracy of 93.48%, specificity of 96.82%, sensitivity of 90.15%, precision of 96.62%, Matthew's correlation coefficient of 0.8719, area under the receiver operating characteristic curve of 0.9786 and area under the precision-recall curve of 0.9836. Additionally, most of the top 20 predicted disease-related phosphorylation sites (19/20 for Alzheimer's disease; 20/16 for neuroblastoma) were verified by literatures and databases. These results show that the proposed method has an outstanding prediction performance and a high practical value.
作为最重要的翻译后修饰(PTMs)之一,蛋白质磷酸化在各种生物过程中起着关键作用。许多研究表明,蛋白质磷酸化与各种人类疾病有关。因此,鉴定蛋白质磷酸化位点-疾病关联可以帮助阐明疾病的发病机制并发现新的药物靶点。构建了磷酸化位点的序列相似性网络和高斯相互作用核相似性网络,以及疾病的语义相似性网络、疾病症状相似性网络和高斯相互作用核相似性网络。为了有效地结合不同的磷酸化位点和疾病相似性信息,使用随机游走再启动算法获取网络的拓扑信息。然后,利用扩散成分分析方法获得综合的磷酸化位点相似性和疾病相似性。同时,基于欧几里得距离方法筛选可靠的负样本。最后,构建卷积神经网络(CNN)模型来识别磷酸化位点和疾病之间的潜在关联。通过十折交叉验证,得到了准确率为 93.48%、特异性为 96.82%、灵敏度为 90.15%、精度为 96.62%、马修相关系数为 0.8719、接收者操作特征曲线下面积为 0.9786 和精度-召回曲线下面积为 0.9836 的评价指标。此外,预测的与疾病相关的前 20 个磷酸化位点中的大多数(阿尔茨海默病 19/20;神经母细胞瘤 20/16)都被文献和数据库验证。这些结果表明,所提出的方法具有出色的预测性能和高实用价值。