一种通过整合基于模板的分配和支持向量机分类器进行蛋白质折叠分类的集成方法。

An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier.

作者信息

Xia Jiaqi, Peng Zhenling, Qi Dawei, Mu Hongbo, Yang Jianyi

机构信息

Department of Physics, Northeast Forestry University, Harbin, China.

Center for Applied Mathematics, Tianjin University, Tianjin, China.

出版信息

Bioinformatics. 2017 Mar 15;33(6):863-870. doi: 10.1093/bioinformatics/btw768.

DOI:10.1093/bioinformatics/btw768

PMID:28039166

Abstract

MOTIVATION

Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before.

RESULTS

We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4-11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved >0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TA-fold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information.

AVAILABILITY AND IMPLEMENTATION

http://yanglab.nankai.edu.cn/TA-fold/.

CONTACT

yangjy@nankai.edu.cn or mhb-506@163.com.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质折叠分类是蛋白质结构预测中的关键步骤。蛋白质折叠分类有两种可能的方法。一种是基于模板的折叠分配，另一种是使用机器学习算法的从头预测。此前从未探索过将这两种解决方案结合起来以提高预测准确性。

结果

我们开发了两种用于蛋白质折叠分类的算法，即HH-fold和SVM-fold。HH-fold是一种使用HHsearch程序的基于模板的折叠分配算法。SVM-fold是一种基于支持向量机的从头分类算法，其中从三个互补序列谱中提取了一组全面的特征。然后将这两种算法结合起来，形成了集成方法TA-fold。我们通过在六个基准数据集上与从头方法和基于模板的穿线方法进行比较，对所提出的方法进行了全面评估。TA-fold在由来自27种折叠的蛋白质组成的DD数据集上达到了0.799的准确率。这比从头方法提高了5.4 - 11.7%。在更新该数据集以纳入更多相同折叠中的蛋白质后，准确率提高到了0.971。此外，TA-fold在由来自184种折叠的6451种蛋白质组成的大型数据集上达到了>0.9的准确率。在LE数据集上的实验表明，TA-fold在家族、超家族和折叠水平上始终优于其他穿线方法。TA-fold的成功归因于基于模板的折叠分配与使用包含丰富进化信息的互补序列谱特征的从头分类的结合。

可用性和实现

http://yanglab.nankai.edu.cn/TA-fold/。

联系方式

yangjy@nankai.edu.cn或mhb-506@163.com。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier.一种通过整合基于模板的分配和支持向量机分类器进行蛋白质折叠分类的集成方法。

Bioinformatics. 2017 Mar 15;33(6):863-870. doi: 10.1093/bioinformatics/btw768.

CoABind: a novel algorithm for Coenzyme A (CoA)- and CoA derivatives-binding residues prediction.CoABind：一种用于辅酶 A（CoA）和 CoA 衍生物结合残基预测的新算法。

Bioinformatics. 2018 Aug 1;34(15):2598-2604. doi: 10.1093/bioinformatics/bty162.

Improving the prediction of protein-nucleic acids binding residues via multiple sequence profiles and the consensus of complementary methods.通过多重序列特征和互补方法的共识提高蛋白质-核酸结合残基的预测。

Bioinformatics. 2019 Mar 15;35(6):930-936. doi: 10.1093/bioinformatics/bty756.

A machine learning information retrieval approach to protein fold recognition.一种用于蛋白质折叠识别的机器学习信息检索方法。

Bioinformatics. 2006 Jun 15;22(12):1456-63. doi: 10.1093/bioinformatics/btl102. Epub 2006 Mar 17.

DeepSF: deep convolutional neural network for mapping protein sequences to folds.DeepSF：一种将蛋白质序列映射到折叠结构的深度卷积神经网络。

Bioinformatics. 2018 Apr 15;34(8):1295-1303. doi: 10.1093/bioinformatics/btx780.

Protein fold classification with genetic algorithms and feature selection.基于遗传算法和特征选择的蛋白质折叠分类

J Bioinform Comput Biol. 2009 Oct;7(5):773-88. doi: 10.1142/s0219720009004321.

Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores.利用支持向量机和序列两两相似得分相结合的蛋白质折叠识别方法。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):2008-2016. doi: 10.1109/TCBB.2020.2966450. Epub 2021 Oct 7.

CATHER: a novel threading algorithm with predicted contacts.CATHER：一种具有预测接触的新型穿线算法。

Bioinformatics. 2020 Apr 1;36(7):2119-2125. doi: 10.1093/bioinformatics/btz876.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation.基于自互协方差变换的新分类学蛋白质折叠识别方法。

Bioinformatics. 2009 Oct 15;25(20):2655-62. doi: 10.1093/bioinformatics/btp500. Epub 2009 Aug 25.

引用本文的文献

Multi-layer sequential network analysis improves protein 3D structural classification.多层序列网络分析提高蛋白质 3D 结构分类。

Proteins. 2022 Sep;90(9):1721-1731. doi: 10.1002/prot.26349. Epub 2022 May 2.

BioS2Net: Holistic Structural and Sequential Analysis of Biomolecules Using a Deep Neural Network.BioS2Net：使用深度神经网络对生物分子进行整体结构和序列分析。

Int J Mol Sci. 2022 Mar 9;23(6):2966. doi: 10.3390/ijms23062966.

Improving protein fold recognition using triplet network and ensemble deep learning.利用三重网络和集成深度学习提高蛋白质折叠识别。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab248.

Why can deep convolutional neural networks improve protein fold recognition? A visual explanation by interpretation.为什么深度卷积神经网络能够提高蛋白质折叠识别能力？通过解释进行可视化分析。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab001.

EnACP: An Ensemble Learning Model for Identification of Anticancer Peptides.EnACP：一种用于鉴定抗癌肽的集成学习模型。

Front Genet. 2020 Jul 30;11:760. doi: 10.3389/fgene.2020.00760. eCollection 2020.

Network-based protein structural classification.基于网络的蛋白质结构分类。

R Soc Open Sci. 2020 Jun 3;7(6):191461. doi: 10.1098/rsos.191461. eCollection 2020 Jun.

A new method for the high-precision assessment of tumor changes in response to treatment.一种用于高精度评估肿瘤对治疗反应变化的新方法。

Bioinformatics. 2018 Aug 1;34(15):2625-2633. doi: 10.1093/bioinformatics/bty115.

mTM-align: an algorithm for fast and accurate multiple protein structure alignment.mTM-align：一种快速准确的多蛋白质结构比对算法。

Bioinformatics. 2018 May 15;34(10):1719-1725. doi: 10.1093/bioinformatics/btx828.

DeepSF: deep convolutional neural network for mapping protein sequences to folds.DeepSF：一种将蛋白质序列映射到折叠结构的深度卷积神经网络。

Bioinformatics. 2018 Apr 15;34(8):1295-1303. doi: 10.1093/bioinformatics/btx780.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种通过整合基于模板的分配和支持向量机分类器进行蛋白质折叠分类的集成方法。

An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献