Suppr超能文献

使用从细胞自动机图像中提取的纹理描述符对蛋白质进行结构分类。

Structural classification of proteins using texture descriptors extracted from the cellular automata image.

作者信息

Kavianpour Hamidreza, Vasighi Mahdi

机构信息

Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), 45137-66731, Zanjan, Iran.

出版信息

Amino Acids. 2017 Feb;49(2):261-271. doi: 10.1007/s00726-016-2354-5. Epub 2016 Oct 24.

Abstract

Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.

摘要

如今,了解蛋白质的细胞属性在药学、医学和分子生物学中具有重要作用。这些属性与蛋白质的功能和三维结构密切相关。蛋白质结构类别的知识被用于各种方法,以更好地理解蛋白质的功能和折叠模式。计算方法和智能系统在进行蛋白质结构分类方面可以发挥重要作用。大多数蛋白质序列作为字符和字符串保存在数据库中,而数值表示对于应用机器学习方法至关重要。在这项工作中,基于根据周围疏水性指数简化的氨基酸字母表,引入了蛋白质序列的二进制表示。这些长二进制序列中隐藏的许多重要特征可以通过它们的细胞自动机图像清晰地显示出来。从这些图像中提取的特征用于通过支持向量机构建分类模型。与之前在几个基准数据集上的研究相比,通过十折交叉验证获得的有前景的分类率表明,当前方法有助于揭示隐藏在蛋白质序列中的一些固有特征,并提高预测蛋白质结构类别的质量。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验