College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150040, China.
Comput Math Methods Med. 2020 Nov 20;2020:8845133. doi: 10.1155/2020/8845133. eCollection 2020.
Amyloid is generally an aggregate of insoluble fibrin; its abnormal deposition is the pathogenic mechanism of various diseases, such as Alzheimer's disease and type II diabetes. Therefore, accurately identifying amyloid is necessary to understand its role in pathology. We proposed a machine learning-based prediction model called PredAmyl-MLP, which consists of the following three steps: feature extraction, feature selection, and classification. In the step of feature extraction, seven feature extraction algorithms and different combinations of them are investigated, and the combination of SVMProt-188D and tripeptide composition (TPC) is selected according to the experimental results. In the step of feature selection, maximum relevant maximum distance (MRMD) and binomial distribution (BD) are, respectively, used to remove the redundant or noise features, and the appropriate features are selected according to the experimental results. In the step of classification, we employed multilayer perceptron (MLP) to train the prediction model. The 10-fold cross-validation results show that the overall accuracy of PredAmyl-MLP reached 91.59%, and the performance was better than the existing methods.
淀粉样蛋白通常是不溶性纤维蛋白的聚集物;其异常沉积是各种疾病(如阿尔茨海默病和 2 型糖尿病)的发病机制。因此,准确识别淀粉样蛋白对于了解其在病理学中的作用是必要的。我们提出了一种基于机器学习的预测模型,称为 PredAmyl-MLP,它由以下三个步骤组成:特征提取、特征选择和分类。在特征提取步骤中,研究了七种特征提取算法及其不同组合,并根据实验结果选择了 SVMProt-188D 和三肽组成(TPC)的组合。在特征选择步骤中,分别使用最大相关最大距离(MRMD)和二项式分布(BD)来去除冗余或噪声特征,并根据实验结果选择适当的特征。在分类步骤中,我们采用多层感知器(MLP)来训练预测模型。10 折交叉验证结果表明,PredAmyl-MLP 的总体准确率达到 91.59%,性能优于现有方法。