使用混合学习系统从序列信息中自动预测蛋白质结构域。

Automatic prediction of protein domains from sequence information using a hybrid learning system.

作者信息

Nagarajan Niranjan, Yona Golan

机构信息

Department of Computer Science, Cornell University, Upson Hall, Ithaca, NY 14853, USA.

出版信息

Bioinformatics. 2004 Jun 12;20(9):1335-60. doi: 10.1093/bioinformatics/bth086. Epub 2004 Feb 12.

DOI:10.1093/bioinformatics/bth086

PMID:14962932

Abstract

MOTIVATION

We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains.

RESULTS

The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed.

AVAILABILITY

An online domain-prediction server is available at http://biozon.org/tools/domains/

摘要

动机

我们描述了一种仅从序列信息中检测蛋白质结构域结构的新方法。该方法基于分析从数据库搜索中获得的多序列比对。定义了多种度量来量化序列中每个位置的结构域信息含量，并使用神经网络将它们组合成一个单一的预测器。输出进一步通过概率模型进行平滑和后处理，以预测结构域之间最可能的过渡位置。

结果

使用SCOP和CATH中已知结构蛋白质的结构域定义对该方法进行了评估，并与其他几种现有方法进行了比较。我们的方法在准确性和敏感性方面都表现良好。即使与一些半自动方法相比，它也比现有最佳方法有显著改进，同时它是完全自动化的。我们的方法还可用于基于结构数据建议和验证结构域划分。还讨论了我们的方法所建议的一些预测结构域定义和替代划分的示例。

可用性

可通过http://biozon.org/tools/domains/获得在线结构域预测服务器。

相似文献

Automatic prediction of protein domains from sequence information using a hybrid learning system.

Bioinformatics. 2004 Jun 12;20(9):1335-60. doi: 10.1093/bioinformatics/bth086. Epub 2004 Feb 12.

Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.

PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.

Towards an automatic classification of protein structural domains based on structural similarity.

BMC Bioinformatics. 2008 Jan 31;9:74. doi: 10.1186/1471-2105-9-74.

Inferring boundary information of discontinuous-domain proteins.

IEEE Trans Nanobioscience. 2008 Sep;7(3):200-5. doi: 10.1109/TNB.2008.2002283.

The DISOPRED server for the prediction of protein disorder.

Bioinformatics. 2004 Sep 1;20(13):2138-9. doi: 10.1093/bioinformatics/bth195. Epub 2004 Mar 25.

Decision tree based information integration for automated protein classification.

J Bioinform Comput Biol. 2005 Jun;3(3):717-42. doi: 10.1142/s0219720005001259.

Identification of putative domain linkers by a neural network - application to a large sequence database.

BMC Bioinformatics. 2006 Jun 27;7:323. doi: 10.1186/1471-2105-7-323.

Accuracy of structure-based sequence alignment of automatic methods.

BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355.

F2CS: FSSP to CATH and SCOP prediction server.

Bioinformatics. 2004 Sep 1;20(13):2150-2. doi: 10.1093/bioinformatics/bth208. Epub 2004 Apr 1.

A new progressive-iterative algorithm for multiple structure alignment.

Bioinformatics. 2005 Aug 1;21(15):3255-63. doi: 10.1093/bioinformatics/bti527. Epub 2005 Jun 7.

引用本文的文献

Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths.

Proc Natl Acad Sci U S A. 2017 Oct 31;114(44):11703-11708. doi: 10.1073/pnas.1707642114. Epub 2017 Oct 19.

A novel method of predicting protein disordered regions based on sequence features.

Biomed Res Int. 2013;2013:414327. doi: 10.1155/2013/414327. Epub 2013 Apr 22.

DomHR: accurately identifying domain boundaries in proteins using a hinge region strategy.

PLoS One. 2013 Apr 11;8(4):e60559. doi: 10.1371/journal.pone.0060559. Print 2013.

Prediction of protein domain with mRMR feature selection and analysis.

PLoS One. 2012;7(6):e39308. doi: 10.1371/journal.pone.0039308. Epub 2012 Jun 15.

DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning.

BMC Bioinformatics. 2011 Feb 1;12:43. doi: 10.1186/1471-2105-12-43.

HangOut: generating clean PSI-BLAST profiles for domains with long insertions.

Bioinformatics. 2010 Jun 15;26(12):1564-5. doi: 10.1093/bioinformatics/btq208. Epub 2010 Apr 22.

DomSVR: domain boundary prediction with support vector regression from sequence information alone.

Amino Acids. 2010 Aug;39(3):713-26. doi: 10.1007/s00726-010-0506-6. Epub 2010 Feb 18.

FIEFDom: a transparent domain boundary recognition system using a fuzzy mean operator.

Nucleic Acids Res. 2009 Feb;37(2):452-62. doi: 10.1093/nar/gkn944. Epub 2008 Dec 4.

OPUS-Dom: applying the folding-based method VECFOLD to determine protein domain boundaries.

J Mol Biol. 2009 Jan 30;385(4):1314-29. doi: 10.1016/j.jmb.2008.10.093. Epub 2008 Nov 10.

Improved general regression network for protein domain boundary prediction.

BMC Bioinformatics. 2008;9 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2105-9-S1-S12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用混合学习系统从序列信息中自动预测蛋白质结构域。

Automatic prediction of protein domains from sequence information using a hybrid learning system.

作者信息

Nagarajan Niranjan, Yona Golan

机构信息

Department of Computer Science, Cornell University, Upson Hall, Ithaca, NY 14853, USA.

出版信息

Bioinformatics. 2004 Jun 12;20(9):1335-60. doi: 10.1093/bioinformatics/bth086. Epub 2004 Feb 12.

DOI:10.1093/bioinformatics/bth086

PMID:14962932

Abstract

MOTIVATION

RESULTS

AVAILABILITY

An online domain-prediction server is available at http://biozon.org/tools/domains/

摘要

动机

结果

可用性

可通过http://biozon.org/tools/domains/获得在线结构域预测服务器。

使用混合学习系统从序列信息中自动预测蛋白质结构域。

Automatic prediction of protein domains from sequence information using a hybrid learning system.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用混合学习系统从序列信息中自动预测蛋白质结构域。

Automatic prediction of protein domains from sequence information using a hybrid learning system.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献