School of Communications and Electronics Jiangxi, Science and Technology Normal University, Nanchang 330013, P. R. China.
Jiangxi Engineering Research Center of Unattended Perception System and Artificial Intelligence Technology Jiangxi Science and Technology Normal University, Jiangxi 330088, P. R. China.
J Bioinform Comput Biol. 2023 Oct;21(5):2350023. doi: 10.1142/S0219720023500233. Epub 2023 Oct 27.
Various diseases, including Huntington's disease, Alzheimer's disease, and Parkinson's disease, have been reported to be linked to amyloid. Therefore, it is crucial to distinguish amyloid from non-amyloid proteins or peptides. While experimental approaches are typically preferred, they are costly and time-consuming. In this study, we have developed a machine learning framework called iAMY-RECMFF to discriminate amyloidgenic from non-amyloidgenic peptides. In our model, we first encoded the peptide sequences using the residue pairwise energy content matrix. We then utilized Pearson's correlation coefficient and distance correlation to extract useful information from this matrix. Additionally, we employed an improved similarity network fusion algorithm to integrate features from different perspectives. The Fisher approach was adopted to select the optimal feature subset. Finally, the selected features were inputted into a support vector machine for identifying amyloidgenic peptides. Experimental results demonstrate that our proposed method significantly improves the identification of amyloidgenic peptides compared to existing predictors. This suggests that our method may serve as a powerful tool in identifying amyloidgenic peptides. To facilitate academic use, the dataset and codes used in the current study are accessible at https://figshare.com/articles/online_resource/iAMY-RECMFF/22816916.
多种疾病,包括亨廷顿氏病、阿尔茨海默病和帕金森病,都被报道与淀粉样蛋白有关。因此,区分淀粉样蛋白和非淀粉样蛋白或肽至关重要。虽然实验方法通常是首选,但它们既昂贵又耗时。在这项研究中,我们开发了一种称为 iAMY-RECMFF 的机器学习框架,用于区分致淀粉样的和非致淀粉样的肽。在我们的模型中,我们首先使用残基对能量含量矩阵对肽序列进行编码。然后,我们利用皮尔逊相关系数和距离相关从该矩阵中提取有用信息。此外,我们采用改进的相似网络融合算法从不同角度整合特征。采用 Fisher 方法选择最优特征子集。最后,将选择的特征输入支持向量机以识别致淀粉样的肽。实验结果表明,与现有预测器相比,我们提出的方法显著提高了致淀粉样肽的识别能力。这表明我们的方法可能成为识别致淀粉样肽的有力工具。为了便于学术使用,本研究中使用的数据集和代码可在 https://figshare.com/articles/online_resource/iAMY-RECMFF/22816916 上获取。