Key Laboratory of Agricultural Electronic Commerce, Ministry of Agriculture, Hefei 230036, China.
Institute of Intelligent Agriculture, Anhui Agricultural University, Hefei 230036, China.
Math Biosci Eng. 2022 Jan 7;19(3):2471-2488. doi: 10.3934/mbe.2022114.
It is vital for the annotation of uncharacterized proteins by protein function prediction. At present, Deep Neural Network based protein function prediction is mainly carried out for dataset of small scale proteins or Gene Ontology, and usually explore the relationships between single protein feature and function tags. The practical methods for large-scale multi-features protein prediction still need to be studied in depth. This paper proposes a DNN based protein function prediction approach IGP-DNN. This method uses Grasshopper Optimization Algorithm (GOA) and Intuitionistic Fuzzy c-Means clustering (IFCM) based protein function modules extracting algorithm to extract the features of protein modules, utilizing Kernel Principal Component Analysis (KPCA) method to reduce the dimensionality of the protein attribute information, and integrating module features and attribute features. Inputting integrated data into DNN through multiple hidden layers to classify proteins and predict protein functions. In the experiments, the F-measure value of IGP-DNN on the DIP dataset reaches 0.4436, which shows better performance.
通过蛋白质功能预测对未鉴定蛋白质进行注释至关重要。目前,基于深度神经网络的蛋白质功能预测主要针对小规模蛋白质数据集或基因本体进行,通常探索单个蛋白质特征与功能标签之间的关系。仍需深入研究用于大规模多特征蛋白质预测的实际方法。本文提出了一种基于深度神经网络的蛋白质功能预测方法 IGP-DNN。该方法使用 Grasshopper Optimization Algorithm (GOA) 和基于 Intuitionistic Fuzzy c-Means clustering (IFCM) 的蛋白质功能模块提取算法提取蛋白质模块的特征,利用 Kernel Principal Component Analysis (KPCA) 方法降低蛋白质属性信息的维数,并整合模块特征和属性特征。通过多个隐藏层将集成数据输入 DNN 以对蛋白质进行分类并预测蛋白质功能。在实验中,IGP-DNN 在 DIP 数据集上的 F-measure 值达到 0.4436,表现出更好的性能。