Suppr超能文献

基于序列-序位频率矩阵的蛋白质远程同源检测和折叠识别。

Protein Remote Homology Detection and Fold Recognition Based on Sequence-Order Frequency Matrix.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):292-300. doi: 10.1109/TCBB.2017.2765331. Epub 2017 Oct 23.

Abstract

Protein remote homology detection and fold recognition are two critical tasks for the studies of protein structures and functions. Currently, the profile-based methods achieve the state-of-the-art performance in these fields. However, the widely used sequence profiles, like position-specific frequency matrix (PSFM) and position-specific scoring matrix (PSSM), ignore the sequence-order effects along protein sequence. In this study, we have proposed a novel profile, called sequence-order frequency matrix (SOFM), to extract the sequence-order information of neighboring residues from multiple sequence alignment (MSA). Combined with two profile feature extraction approaches, top-n-grams and the Smith-Waterman algorithm, the SOFMs are applied to protein remote homology detection and fold recognition, and two predictors called SOFM-Top and SOFM-SW are proposed. Experimental results show that SOFM contains more information content than other profiles, and these two predictors outperform other state-of-the-art methods. It is anticipated that SOFM will become a very useful profile in the studies of protein structures and functions.

摘要

蛋白质远程同源检测和折叠识别是研究蛋白质结构和功能的两个关键任务。目前,基于轮廓的方法在这些领域中达到了最先进的性能。然而,广泛使用的序列轮廓,如位置特异性频率矩阵 (PSFM) 和位置特异性评分矩阵 (PSSM),忽略了蛋白质序列中沿序列顺序的效应。在这项研究中,我们提出了一种新的轮廓,称为序列顺序频率矩阵 (SOFM),从多重序列比对 (MSA) 中提取相邻残基的序列顺序信息。结合两种轮廓特征提取方法,即 top-n-grams 和 Smith-Waterman 算法,将 SOFMs 应用于蛋白质远程同源检测和折叠识别,并提出了两个名为 SOFM-Top 和 SOFM-SW 的预测器。实验结果表明,SOFM 比其他轮廓包含更多的信息内容,这两个预测器优于其他最先进的方法。预计 SOFM 将成为研究蛋白质结构和功能的非常有用的轮廓。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验