Suppr超能文献

利用物理化学参数对蛋白质结构进行自动分类。

Automatic classification of protein structures using physicochemical parameters.

作者信息

Mohan Abhilash, Rao M Divya, Sunderrajan Shruthi, Pennathur Gautam

机构信息

The Center for Biotechnology, Anna University, Chennai, 600025, Tamilnadu, India.

出版信息

Interdiscip Sci. 2014 Sep;6(3):176-86. doi: 10.1007/s12539-013-0199-0. Epub 2014 Sep 11.

Abstract

Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.

摘要

蛋白质分类是功能注释的第一步;SCOP和Pfam数据库是目前最相关的蛋白质分类方案。然而,生成的三维(3D)蛋白质结构数量与其分类到相关超家族/家族中的数量不成比例,这凸显了自动分类方案的必要性。仅基于序列信息预测新蛋白质的功能已被证明是一项重大挑战。本研究重点在于结合物理化学参数与机器学习算法(朴素贝叶斯、决策树、随机森林和支持向量机),利用序列衍生信息将蛋白质分类到各自的SCOP超家族/Pfam家族中。Spectrophores™,一种围绕结构的3D分子场的一维描述符,被用作比较物理化学参数性能的基准。对机器学习算法进行了修改,以便根据每个SCOP超家族/Pfam家族的信息增益来选择特征。研究了结合物理化学参数和Spectrophores对分类准确率(CA)的影响。用物理化学参数训练的机器学习算法始终能将SCOP超家族和Pfam家族分类,分类准确率高于90%,而Spectrophores的分类准确率约为85%。特征选择提高了基于物理化学参数和Spectrophores的机器学习算法的分类准确率。结合这两个属性导致性能略有下降。物理化学参数能够将来自这两种方案的蛋白质分类,分类准确率在90 - 96%之间。这些结果表明该方法在从氨基酸序列对蛋白质进行分类方面的有用性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验