Suppr超能文献

ProClusEnsem:通过融合不同模式的伪氨基酸组成来预测膜蛋白类型。

ProClusEnsem: predicting membrane protein types by fusing different modes of pseudo amino acid composition.

机构信息

Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Jeddah, Saudi Arabia.

出版信息

Comput Biol Med. 2012 May;42(5):564-74. doi: 10.1016/j.compbiomed.2012.01.012. Epub 2012 Mar 3.

Abstract

Knowing the type of an uncharacterized membrane protein often provides a useful clue in both basic research and drug discovery. With the explosion of protein sequences generated in the post genomic era, determination of membrane protein types by experimental methods is expensive and time consuming. It therefore becomes important to develop an automated method to find the possible types of membrane proteins. In view of this, various computational membrane protein prediction methods have been proposed. They extract protein feature vectors, such as PseAAC (pseudo amino acid composition) and PsePSSM (pseudo position-specific scoring matrix) for representation of protein sequence, and then learn a distance metric for the KNN (K nearest neighbor) or NN (nearest neighbor) classifier to predicate the final type. Most of the metrics are learned using linear dimensionality reduction algorithms like Principle Components Analysis (PCA) and Linear Discriminant Analysis (LDA). Such metrics are common to all the proteins in the dataset. In fact, they assume that the proteins lie on a uniform distribution, which can be captured by the linear dimensionality reduction algorithm. We doubt this assumption, and learn local metrics which are optimized for local subset of the whole proteins. The learning procedure is iterated with the protein clustering. Then a novel ensemble distance metric is given by combining the local metrics through Tikhonov regularization. The experimental results on a benchmark dataset demonstrate the feasibility and effectiveness of the proposed algorithm named ProClusEnsem.

摘要

了解未表征膜蛋白的类型通常在基础研究和药物发现中都提供了有用的线索。随着在后基因组时代产生的蛋白质序列的爆炸式增长,通过实验方法确定膜蛋白的类型既昂贵又耗时。因此,开发一种自动方法来寻找可能的膜蛋白类型变得非常重要。有鉴于此,已经提出了各种计算膜蛋白预测方法。它们提取蛋白质特征向量,例如 PseAAC(伪氨基酸组成)和 PsePSSM(伪位置特异性评分矩阵),用于表示蛋白质序列,然后学习 KNN(K 最近邻)或 NN(最近邻)分类器的距离度量,以预测最终类型。大多数度量标准是使用线性降维算法(如主成分分析(PCA)和线性判别分析(LDA))学习的。这些度量标准对于数据集中的所有蛋白质都是通用的。事实上,它们假设蛋白质位于均匀分布上,这可以通过线性降维算法来捕获。我们怀疑这个假设,并学习针对整个蛋白质的局部子集进行优化的局部度量标准。学习过程通过蛋白质聚类进行迭代。然后,通过 Tikhonov 正则化将局部度量标准组合起来,得到一个新的集成距离度量标准。在基准数据集上的实验结果证明了名为 ProClusEnsem 的算法的可行性和有效性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验