Suppr超能文献

基于进化信息和 LDA 的两种新特征提取方法对凋亡蛋白的亚细胞定位预测

Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA.

机构信息

School of Information Science and Engineering, University of Jinan, Jinan, 250022, China.

Shandong Provincial Key laboratory of Network Based Intelligent Computing, Jinan, 250022, China.

出版信息

BMC Bioinformatics. 2020 May 24;21(1):212. doi: 10.1186/s12859-020-3539-1.

Abstract

BACKGROUND

Apoptosis, also called programmed cell death, refers to the spontaneous and orderly death of cells controlled by genes in order to maintain a stable internal environment. Identifying the subcellular location of apoptosis proteins is very helpful in understanding the mechanism of apoptosis and designing drugs. Therefore, the subcellular localization of apoptosis proteins has attracted increased attention in computational biology. Effective feature extraction methods play a critical role in predicting the subcellular location of proteins.

RESULTS

In this paper, we proposed two novel feature extraction methods based on evolutionary information. One of the features obtained the evolutionary information via the transition matrix of the consensus sequence (CTM). And the other utilized the evolutionary information from PSSM based on absolute entropy correlation analysis (AECA-PSSM). After fusing the two kinds of features, linear discriminant analysis (LDA) was used to reduce the dimension of the proposed features. Finally, the support vector machine (SVM) was adopted to predict the protein subcellular locations. The proposed CTM-AECA-PSSM-LDA subcellular location prediction method was evaluated using the CL317 dataset and ZW225 dataset. By jackknife test, the overall accuracy was 99.7% (CL317) and 95.6% (ZW225) respectively.

CONCLUSIONS

The experimental results show that the proposed method which is hopefully to be a complementary tool for the existing methods of subcellular localization, can effectively extract more abundant features of protein sequence and is feasible in predicting the subcellular location of apoptosis proteins.

摘要

背景

细胞凋亡,也称为程序性细胞死亡,是指细胞在基因控制下的自发有序死亡,以维持内部环境的稳定。鉴定细胞凋亡蛋白的亚细胞定位对于理解细胞凋亡的机制和设计药物非常有帮助。因此,细胞凋亡蛋白的亚细胞定位在计算生物学中受到了越来越多的关注。有效的特征提取方法在预测蛋白质的亚细胞定位中起着至关重要的作用。

结果

在本文中,我们提出了两种基于进化信息的新特征提取方法。其中一种特征通过一致序列的转移矩阵(CTM)获得进化信息。另一种方法则基于绝对熵相关分析(AECA-PSSM)利用 PSSM 中的进化信息。融合两种特征后,使用线性判别分析(LDA)降低所提出特征的维度。最后,采用支持向量机(SVM)对蛋白质的亚细胞位置进行预测。采用 CL317 数据集和 ZW225 数据集对提出的 CTM-AECA-PSSM-LDA 亚细胞定位预测方法进行评估。通过折刀检验,总体准确率分别为 99.7%(CL317)和 95.6%(ZW225)。

结论

实验结果表明,该方法有望成为亚细胞定位现有方法的补充工具,能够有效地提取蛋白质序列更丰富的特征,在预测细胞凋亡蛋白的亚细胞定位方面是可行的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ab/7245797/ea524196ac29/12859_2020_3539_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验