• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从与预测序列具有 twilight-zone 身份的序列中预测蛋白质结构类别

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

机构信息

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada.

出版信息

BMC Bioinformatics. 2009 Dec 13;10:414. doi: 10.1186/1471-2105-10-414.

DOI:10.1186/1471-2105-10-414
PMID:20003388
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2805645/
Abstract

BACKGROUND

Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences.

RESULTS

The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes.

CONCLUSIONS

The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.

摘要

背景

许多方法都利用结构类别知识来识别蛋白质的结构/功能特征,并且可以用于检测远程同源物,特别是对于共享黄昏区相似性的链。与现有的基于序列的结构类别预测器不同,后者针对四个主要类别,并且专为高相似度序列设计,我们从与训练序列具有黄昏区相似度的序列中预测七个类别。

结果

所提出的 MODular Approach to Structural class prediction (MODAS) 方法是独一无二的,因为它允许选择任何类别的子集。MODAS 也是第一个利用新颖的、定制的基于特征的序列表示,该表示结合了进化轮廓和预测的二级结构。这些特征量化了与类别定义相关的信息,包括残基的保守性以及螺旋/链段的排列和数量。我们的综合设计考虑了 8 种特征选择方法和 4 种分类器,以开发针对每个七个类别的基于支持向量机的分类器。在 5 个黄昏区和 1 个高相似度基准数据集上进行测试,并与 20 多个现代竞争预测器进行比较,结果表明 MODAS 提供了最佳的整体准确性,范围在 80%到 96.7%之间(对于黄昏区数据集为 83.5%),具体取决于数据集。与两个最大数据集上表现最好的竞争方法相比,这分别转化为 19%和 8%的错误率降低。尽管该类别的数据仅占 2%,但所提出的预测器仍能以 58%的准确率准确预测膜蛋白类,这是大多数现有方法所不考虑的。我们对预测模型进行了分析,以展示输入特征如何以及为何与相应类别相关联。

结论

改进的预测源于新颖的特征,这些特征表达了蛋白质序列中二级结构片段的排列,并结合了进化和二级结构信息。我们的工作表明,沿蛋白质链预测的二级结构片段的保守性和排列可以成功预测基于二级结构空间排列定义的结构类别。一个网络服务器可在 http://biomine.ece.ualberta.ca/MODAS/ 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90f8/2805645/41f89d1efab6/1471-2105-10-414-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90f8/2805645/633cde48a658/1471-2105-10-414-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90f8/2805645/3649cdc75c6a/1471-2105-10-414-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90f8/2805645/a3cd4994ed62/1471-2105-10-414-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90f8/2805645/41f89d1efab6/1471-2105-10-414-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90f8/2805645/633cde48a658/1471-2105-10-414-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90f8/2805645/3649cdc75c6a/1471-2105-10-414-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90f8/2805645/a3cd4994ed62/1471-2105-10-414-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/90f8/2805645/41f89d1efab6/1471-2105-10-414-4.jpg

相似文献

1
Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.从与预测序列具有 twilight-zone 身份的序列中预测蛋白质结构类别
BMC Bioinformatics. 2009 Dec 13;10:414. doi: 10.1186/1471-2105-10-414.
2
SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.SCPRED:对与预测序列具有模糊相似性的序列的蛋白质结构类别进行准确预测。
BMC Bioinformatics. 2008 May 1;9:226. doi: 10.1186/1471-2105-9-226.
3
Prediction of protein structural class using novel evolutionary collocation-based sequence representation.使用基于新型进化搭配的序列表示法预测蛋白质结构类别。
J Comput Chem. 2008 Jul 30;29(10):1596-604. doi: 10.1002/jcc.20918.
4
Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.基于预测的二级结构集合和多重比对,以超过80%的准确率预测β转角。
BMC Bioinformatics. 2008 Oct 10;9:430. doi: 10.1186/1471-2105-9-430.
5
Prediction of protein structural class for the twilight zone sequences.对处于模糊界限区域的序列进行蛋白质结构类别的预测。
Biochem Biophys Res Commun. 2007 Jun 1;357(2):453-60. doi: 10.1016/j.bbrc.2007.03.164. Epub 2007 Apr 5.
6
PFRES: protein fold classification by using evolutionary information and predicted secondary structure.PFRES:利用进化信息和预测的二级结构进行蛋白质折叠分类
Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17.
7
Beyond the Twilight Zone: automated prediction of structural properties of proteins by recursive neural networks and remote homology information.超越模糊地带:利用递归神经网络和远程同源信息自动预测蛋白质的结构特性
Proteins. 2009 Oct;77(1):181-90. doi: 10.1002/prot.22429.
8
iFC²: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content.iFC²:一个集成的网络服务器,用于提高蛋白质结构类别、折叠类型和二级结构含量的预测。
Amino Acids. 2011 Mar;40(3):963-73. doi: 10.1007/s00726-010-0721-1. Epub 2010 Aug 21.
9
PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences.PSS-3D1D:一种用于注释模糊区域序列的改进型蛋白质折叠识别3D1D轮廓方法。
J Struct Funct Genomics. 2011 Dec;12(4):181-9. doi: 10.1007/s10969-011-9119-x. Epub 2011 Dec 3.
10
PredictSuperFam-PSS-3D1D: A server for predicting superfamily for the annotation of twilight zone protein sequences.PredictSuperFam-PSS-3D1D:一个用于预测超家族的服务器,用于注释 twilight zone 蛋白序列。
J Struct Biol. 2020 May 1;210(2):107479. doi: 10.1016/j.jsb.2020.107479. Epub 2020 Feb 17.

引用本文的文献

1
Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences.使用递归特征选择和随机森林提高低相似度序列的蛋白质结构分类预测。
Comput Math Methods Med. 2021 May 7;2021:5529389. doi: 10.1155/2021/5529389. eCollection 2021.
2
New insight into poly (3-hydroxybutyrate) production by Azomonas macrocytogenes isolate KC685000: large scale production, kinetic modeling, recovery and characterization.对巨单胞菌 KC685000 生产聚 3-羟基丁酸的新见解:大规模生产、动力学建模、回收和表征。
Mol Biol Rep. 2019 Jun;46(3):3357-3370. doi: 10.1007/s11033-019-04798-4. Epub 2019 Apr 17.
3

本文引用的文献

1
An information-theoretic approach to the prediction of protein structural class.一种基于信息论的蛋白质结构类别预测方法。
J Comput Chem. 2010 Apr 30;31(6):1201-6. doi: 10.1002/jcc.21406.
2
Using maximum entropy model to predict protein secondary structure with single sequence.使用最大熵模型通过单序列预测蛋白质二级结构。
Protein Pept Lett. 2009;16(5):552-60. doi: 10.2174/092986609788167833.
3
A database of domain definitions for proteins with complex interdomain geometry.具有复杂结构域间几何形状的蛋白质结构域定义数据库。
Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach.
基于带间隙二肽和递归特征选择方法的蛋白质结构类预测
Int J Mol Sci. 2015 Dec 24;17(1):15. doi: 10.3390/ijms17010015.
4
General overview on structure prediction of twilight-zone proteins.关于暗区蛋白结构预测的概述
Theor Biol Med Model. 2015 Sep 4;12:15. doi: 10.1186/s12976-015-0014-1.
5
Customised fragments libraries for protein structure prediction based on structural class annotations.基于结构类注释的用于蛋白质结构预测的定制片段文库。
BMC Bioinformatics. 2015 Apr 29;16(1):136. doi: 10.1186/s12859-015-0576-2.
6
Many local pattern texture features: which is better for image-based multilabel human protein subcellular localization classification?多种局部模式纹理特征:哪种特征更适合基于图像的多标签人类蛋白质亚细胞定位分类?
ScientificWorldJournal. 2014;2014:429049. doi: 10.1155/2014/429049. Epub 2014 Jun 24.
7
Quad-PRE: a hybrid method to predict protein quaternary structure attributes.Quad-PRE:一种预测蛋白质四级结构属性的混合方法。
Comput Math Methods Med. 2014;2014:715494. doi: 10.1155/2014/715494. Epub 2014 May 18.
8
PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.PSSP-RFE:通过从PSI-BLAST序列谱、物理化学性质和功能注释中进行递归特征提取来准确预测蛋白质结构类别。
PLoS One. 2014 Mar 27;9(3):e92863. doi: 10.1371/journal.pone.0092863. eCollection 2014.
9
Proposing a highly accurate protein structural class predictor using segmentation-based features.提出一种基于分段特征的高精度蛋白质结构类预测器。
BMC Genomics. 2014;15 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2164-15-S1-S2. Epub 2014 Jan 24.
10
Insights from the molecular characterization of mercury stress proteins identified by proteomics in E.coli nissle 1917.通过蛋白质组学鉴定大肠杆菌Nissle 1917中汞应激蛋白的分子特征所获得的见解。
Bioinformation. 2013 May 25;9(9):485-90. doi: 10.6026/97320630009485. Print 2013.
PLoS One. 2009;4(4):e5084. doi: 10.1371/journal.pone.0005084. Epub 2009 Apr 8.
4
Prediction of protein structural class using a complexity-based distance measure.基于复杂度的距离度量预测蛋白质结构类别。
Amino Acids. 2010 Mar;38(3):721-8. doi: 10.1007/s00726-009-0276-1. Epub 2009 Mar 28.
5
Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation.基于混沌游戏表示的递归定量分析预测蛋白质结构类别。
J Theor Biol. 2009 Apr 21;257(4):618-26. doi: 10.1016/j.jtbi.2008.12.027. Epub 2009 Jan 8.
6
Online tools for predicting integral membrane proteins.预测整合膜蛋白的在线工具。
Methods Mol Biol. 2009;528:3-23. doi: 10.1007/978-1-60327-310-7_1.
7
The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies.重温CATH分类——超家族中结构差异的架构综述及新表征方法
Nucleic Acids Res. 2009 Jan;37(Database issue):D310-4. doi: 10.1093/nar/gkn877. Epub 2008 Nov 7.
8
Prediction of the protein structural class by specific peptide frequencies.通过特定肽段频率预测蛋白质结构类别。
Biochimie. 2009 Feb;91(2):226-9. doi: 10.1016/j.biochi.2008.09.005. Epub 2008 Oct 10.
9
Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.基于预测的二级结构集合和多重比对,以超过80%的准确率预测β转角。
BMC Bioinformatics. 2008 Oct 10;9:430. doi: 10.1186/1471-2105-9-430.
10
Sequence based residue depth prediction using evolutionary information and predicted secondary structure.基于序列的残基深度预测,利用进化信息和预测的二级结构。
BMC Bioinformatics. 2008 Sep 20;9:388. doi: 10.1186/1471-2105-9-388.