概率多类多核学习：用于蛋白质折叠识别和远程同源性检测

Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.

作者信息

Damoulas Theodoros, Girolami Mark A

机构信息

Department of Computing Science, University of Glasgow, S. A. W. Building, G12 8QQ, UK.

出版信息

Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31.

DOI:10.1093/bioinformatics/btn112

PMID:18378524

Abstract

MOTIVATION

The problems of protein fold recognition and remote homology detection have recently attracted a great deal of interest as they represent challenging multi-feature multi-class problems for which modern pattern recognition methods achieve only modest levels of performance. As with many pattern recognition problems, there are multiple feature spaces or groups of attributes available, such as global characteristics like the amino-acid composition (C), predicted secondary structure (S), hydrophobicity (H), van der Waals volume (V), polarity (P), polarizability (Z), as well as attributes derived from local sequence alignment such as the Smith-Waterman scores. This raises the need for a classification method that is able to assess the contribution of these potentially heterogeneous object descriptors while utilizing such information to improve predictive performance. To that end, we offer a single multi-class kernel machine that informatively combines the available feature groups and, as is demonstrated in this article, is able to provide the state-of-the-art in performance accuracy on the fold recognition problem. Furthermore, the proposed approach provides some insight by assessing the significance of recently introduced protein features and string kernels. The proposed method is well-founded within a Bayesian hierarchical framework and a variational Bayes approximation is derived which allows for efficient CPU processing times.

RESULTS

The best performance which we report on the SCOP PDB-40D benchmark data-set is a 70% accuracy by combining all the available feature groups from global protein characteristics but also including sequence-alignment features. We offer an 8% improvement on the best reported performance that combines multi-class k-nn classifiers while at the same time reducing computational costs and assessing the predictive power of the various available features. Furthermore, we examine the performance of our methodology on the SCOP 1.53 benchmark data-set that simulates remote homology detection and examine the combination of various state-of-the-art string kernels that have recently been proposed.

摘要

动机

蛋白质折叠识别和远程同源性检测问题近来引起了广泛关注，因为它们代表了具有挑战性的多特征多类别问题，现代模式识别方法在这些问题上的表现仅处于中等水平。与许多模式识别问题一样，存在多个特征空间或属性组，例如氨基酸组成（C）、预测二级结构（S）、疏水性（H）、范德华体积（V）、极性（P）、极化率（Z）等全局特征，以及源自局部序列比对的属性，如史密斯-沃特曼得分。这就需要一种分类方法，能够评估这些潜在异质的对象描述符的贡献，同时利用这些信息来提高预测性能。为此，我们提供了一种单一的多类别核机器，它能有效地组合可用的特征组，并且如本文所示，能够在折叠识别问题上提供最先进的性能准确性。此外，所提出的方法通过评估最近引入的蛋白质特征和字符串核的重要性提供了一些见解。所提出的方法在贝叶斯层次框架内有充分的依据，并推导了变分贝叶斯近似，这使得能够实现高效的CPU处理时间。

结果

我们在SCOP PDB - 40D基准数据集上报告的最佳性能是通过结合来自全局蛋白质特征的所有可用特征组（还包括序列比对特征）达到了70%的准确率。我们比结合多类别k近邻分类器报告的最佳性能提高了8%，同时降低了计算成本，并评估了各种可用特征的预测能力。此外，我们在模拟远程同源性检测的SCOP 1.53基准数据集上检验了我们方法的性能，并研究了最近提出的各种最先进的字符串核的组合。

相似文献

Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.

Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.

Comput Biol Med. 2011 Aug;41(8):687-99. doi: 10.1016/j.compbiomed.2011.06.004. Epub 2011 Jun 25.

Mismatch string kernels for discriminative protein classification.

Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.

Application of latent semantic analysis to protein remote homology detection.

Bioinformatics. 2006 Feb 1;22(3):285-90. doi: 10.1093/bioinformatics/bti801. Epub 2005 Nov 29.

Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers.

IEEE Trans Nanobioscience. 2009 Mar;8(1):100-10. doi: 10.1109/TNB.2009.2016488. Epub 2009 Mar 10.

A structural alignment kernel for protein structures.

Bioinformatics. 2007 May 1;23(9):1090-8. doi: 10.1093/bioinformatics/btl642. Epub 2007 Jan 18.

Profile-based direct kernels for remote homology detection and fold recognition.

Bioinformatics. 2005 Dec 1;21(23):4239-47. doi: 10.1093/bioinformatics/bti687. Epub 2005 Sep 27.

Remote homology detection based on oligomer distances.

Bioinformatics. 2006 Sep 15;22(18):2224-31. doi: 10.1093/bioinformatics/btl376. Epub 2006 Jul 12.

Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.

Bioinformatics. 2007 Dec 15;23(24):3320-7. doi: 10.1093/bioinformatics/btm527. Epub 2007 Nov 7.

引用本文的文献

Machine Learning Based Multimodal Neuroimaging Genomics Dementia Score for Predicting Future Conversion to Alzheimer's Disease.

J Alzheimers Dis. 2022;87(3):1345-1365. doi: 10.3233/JAD-220021.

Predicting Conversion from MCI to AD Combining Multi-Modality Data and Based on Molecular Subtype.

Brain Sci. 2021 May 21;11(6):674. doi: 10.3390/brainsci11060674.

Decoding Covert Speech From EEG-A Comprehensive Review.

Front Neurosci. 2021 Apr 29;15:642251. doi: 10.3389/fnins.2021.642251. eCollection 2021.

Blinded Clinical Evaluation for Dementia of Alzheimer's Type Classification Using FDG-PET: A Comparison Between Feature-Engineered and Non-Feature-Engineered Machine Learning Methods.

J Alzheimers Dis. 2021;80(2):715-726. doi: 10.3233/JAD-201591.

Using machine learning to quantify structural MRI neurodegeneration patterns of Alzheimer's disease into dementia score: Independent validation on 8,834 images from ADNI, AIBL, OASIS, and MIRIAD databases.

Hum Brain Mapp. 2020 Oct 1;41(14):4127-4147. doi: 10.1002/hbm.25115. Epub 2020 Jul 2.

Adaptive multi-degree of freedom Brain Computer Interface using online feedback: Towards novel methods and metrics of mutual adaptation between humans and machines for BCI.

PLoS One. 2019 Mar 6;14(3):e0212620. doi: 10.1371/journal.pone.0212620. eCollection 2019.

Incomplete Multiview Clustering via Late Fusion.

Comput Intell Neurosci. 2018 Oct 1;2018:6148456. doi: 10.1155/2018/6148456. eCollection 2018.

Development and validation of a novel dementia of Alzheimer's type (DAT) score based on metabolism FDG-PET imaging.

Neuroimage Clin. 2018 Mar 10;18:802-813. doi: 10.1016/j.nicl.2018.03.007. eCollection 2018.

DeepSF: deep convolutional neural network for mapping protein sequences to folds.

Bioinformatics. 2018 Apr 15;34(8):1295-1303. doi: 10.1093/bioinformatics/btx780.

Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition.

Int J Mol Sci. 2016 Dec 16;17(12):2118. doi: 10.3390/ijms17122118.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

概率多类多核学习：用于蛋白质折叠识别和远程同源性检测

Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献