• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于结构相似性的蛋白质结构域自动分类研究

Towards an automatic classification of protein structural domains based on structural similarity.

作者信息

Sam Vichetra, Tai Chin-Hsien, Garnier Jean, Gibrat Jean-Francois, Lee Byungkook, Munson Peter J

机构信息

Mathematical and Statistical Computing Laboratory, DCB, CIT, NIH, DHHS, Bethesda, MD, USA.

出版信息

BMC Bioinformatics. 2008 Jan 31;9:74. doi: 10.1186/1471-2105-9-74.

DOI:10.1186/1471-2105-9-74
PMID:18237410
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2267780/
Abstract

BACKGROUND

Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dictionary, yield divergent classifications, for reasons not yet fully investigated. One possible reason is that the pairwise similarity scores used in automatic classification do not adequately reflect the judgments made in manual classification. Another possibility is the difference between manual and automatic classification procedures. We explore the degree to which these two factors might affect the final classification.

RESULTS

We use DALI, SHEBA and VAST pairwise scores on the SCOP C class domains, to investigate a variety of hierarchical clustering procedures. The constructed dendrogram is cut in a variety of ways to produce a partition, which is compared to the SCOP fold classification.Ward's method dendrograms led to partitions closest to the SCOP fold classification. Dendrogram- or tree-cutting strategies fell into four categories according to the similarity of resulting partitions to the SCOP fold partition. Two strategies which optimize similarity to SCOP, gave an average of 72% true positives rate (TPR), at a 1% false positive rate. Cutting the largest size cluster at each step gave an average of 61% TPR which was one of the best strategies not making use of prior knowledge of SCOP. Cutting the longest branch at each step produced one of the worst strategies. We also developed a method to detect irreducible differences between the best possible automatic partitions and SCOP, regardless of the cutting strategy. These differences are substantial. Visual examination of hard-to-classify proteins confirms our previous finding, that global structural similarity of domains is not the only criterion used in the SCOP classification.

CONCLUSION

Different clustering procedures give rise to different levels of agreement between automatic and manual protein classifications. None of the tested procedures completely eliminates the divergence between automatic and manual protein classifications. Achieving full agreement between these two approaches would apparently require additional information.

摘要

背景

对大量蛋白质结构进行正式分类有助于理解它们之间的进化关系。涉及人工步骤的分类方法,如SCOP和CATH,面临着可用结构数量不断增加的挑战。诸如FSSP或Dali Domain Dictionary等自动方法产生了不同的分类结果,原因尚未完全研究清楚。一个可能的原因是自动分类中使用的成对相似性得分没有充分反映人工分类中的判断。另一种可能性是人工和自动分类程序之间的差异。我们探讨了这两个因素可能影响最终分类的程度。

结果

我们在SCOP C类结构域上使用DALI、SHEBA和VAST成对得分,研究了各种层次聚类程序。以各种方式切割构建的树状图以产生一个划分,并将其与SCOP折叠分类进行比较。Ward方法树状图导致的划分最接近SCOP折叠分类。根据所得划分与SCOP折叠划分的相似性,树状图或树切割策略分为四类。两种优化与SCOP相似性的策略,在1%的误报率下,平均真阳性率(TPR)为72%。在每一步切割最大规模的聚类,平均TPR为61%,这是不利用SCOP先验知识的最佳策略之一。在每一步切割最长的分支产生了最差的策略之一。我们还开发了一种方法来检测最佳可能的自动划分与SCOP之间的不可约差异,而不管切割策略如何。这些差异很大。对难以分类的蛋白质进行视觉检查证实了我们之前的发现,即结构域的全局结构相似性不是SCOP分类中使用的唯一标准。

结论

不同的聚类程序导致自动和人工蛋白质分类之间的一致程度不同。没有一种测试程序能完全消除自动和人工蛋白质分类之间的差异。要使这两种方法完全一致,显然需要额外的信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/ac4d0eef281f/1471-2105-9-74-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/64f339713577/1471-2105-9-74-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/9cadd7307397/1471-2105-9-74-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/9926de276ebe/1471-2105-9-74-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/ce2baca3c513/1471-2105-9-74-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/8072e3b0cdc6/1471-2105-9-74-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/09cd52338064/1471-2105-9-74-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/c5076c76036d/1471-2105-9-74-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/ac4d0eef281f/1471-2105-9-74-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/64f339713577/1471-2105-9-74-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/9cadd7307397/1471-2105-9-74-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/9926de276ebe/1471-2105-9-74-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/ce2baca3c513/1471-2105-9-74-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/8072e3b0cdc6/1471-2105-9-74-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/09cd52338064/1471-2105-9-74-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/c5076c76036d/1471-2105-9-74-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/144a/2267780/ac4d0eef281f/1471-2105-9-74-8.jpg

相似文献

1
Towards an automatic classification of protein structural domains based on structural similarity.基于结构相似性的蛋白质结构域自动分类研究
BMC Bioinformatics. 2008 Jan 31;9:74. doi: 10.1186/1471-2105-9-74.
2
Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.离散与连续蛋白质结构空间之间的交叉:对蛋白质结构自动分类及网络的见解。
PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.
3
Automatic classification of protein structures using low-dimensional structure space mappings.利用低维结构空间映射对蛋白质结构进行自动分类。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-15-S2-S1. Epub 2014 Jan 24.
4
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.
5
ROC and confusion analysis of structure comparison methods identify the main causes of divergence from manual protein classification.结构比较方法的ROC和混淆分析确定了与手动蛋白质分类存在差异的主要原因。
BMC Bioinformatics. 2006 Apr 13;7:206. doi: 10.1186/1471-2105-7-206.
6
Automatic prediction of protein domains from sequence information using a hybrid learning system.使用混合学习系统从序列信息中自动预测蛋白质结构域。
Bioinformatics. 2004 Jun 12;20(9):1335-60. doi: 10.1093/bioinformatics/bth086. Epub 2004 Feb 12.
7
Accuracy of structure-based sequence alignment of automatic methods.自动方法的基于结构的序列比对准确性。
BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355.
8
A fast SCOP fold classification system using content-based E-Predict algorithm.一种使用基于内容的E-Predict算法的快速SCOP折叠分类系统。
BMC Bioinformatics. 2006 Jul 26;7:362. doi: 10.1186/1471-2105-7-362.
9
Towards automatic clustering of protein sequences.迈向蛋白质序列的自动聚类
Proc IEEE Comput Soc Bioinform Conf. 2002;1:175-86.
10
A comprehensive system for evaluation of remote sequence similarity detection.一种用于评估远程序列相似性检测的综合系统。
BMC Bioinformatics. 2007 Aug 28;8:314. doi: 10.1186/1471-2105-8-314.

引用本文的文献

1
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.基于多种拉氏构象分布的蛋白质结构分类与环建模
Comput Struct Biotechnol J. 2017 Feb 8;15:243-254. doi: 10.1016/j.csbj.2017.01.011. eCollection 2017.
2
Automatic classification of protein structures using low-dimensional structure space mappings.利用低维结构空间映射对蛋白质结构进行自动分类。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-15-S2-S1. Epub 2014 Jan 24.
3
Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA.

本文引用的文献

1
AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.AutoSCOP:使用独特的模式-类别映射自动预测SCOP分类
Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.
2
Clustering by passing messages between data points.通过在数据点之间传递信息进行聚类。
Science. 2007 Feb 16;315(5814):972-6. doi: 10.1126/science.1136800. Epub 2007 Jan 11.
3
A framework for protein structure classification and identification of novel protein structures.一种蛋白质结构分类及新型蛋白质结构识别的框架。
利用 GPU 和 CUDA 进行 3D 蛋白质结构相似性搜索的并行实现
J Mol Model. 2014 Feb;20(2):2067. doi: 10.1007/s00894-014-2067-1. Epub 2014 Jan 31.
4
Automatic classification of protein structures relying on similarities between alignments.基于比对间相似性的蛋白质结构自动分类。
BMC Bioinformatics. 2012 Sep 14;13:233. doi: 10.1186/1471-2105-13-233.
5
Overcoming sequence misalignments with weighted structural superposition.利用加权结构叠加克服序列不对齐。
Proteins. 2012 Nov;80(11):2523-35. doi: 10.1002/prot.24134. Epub 2012 Jul 28.
6
Touring protein space with Matt. touring protein space with Matt.
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):286-93. doi: 10.1109/TCBB.2011.70. Epub 2011 Apr 1.
7
Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.离散与连续蛋白质结构空间之间的交叉:对蛋白质结构自动分类及网络的见解。
PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.
BMC Bioinformatics. 2006 Oct 16;7:456. doi: 10.1186/1471-2105-7-456.
4
ROC and confusion analysis of structure comparison methods identify the main causes of divergence from manual protein classification.结构比较方法的ROC和混淆分析确定了与手动蛋白质分类存在差异的主要原因。
BMC Bioinformatics. 2006 Apr 13;7:206. doi: 10.1186/1471-2105-7-206.
5
Information-based clustering.基于信息的聚类
Proc Natl Acad Sci U S A. 2005 Dec 20;102(51):18297-302. doi: 10.1073/pnas.0507432102. Epub 2005 Dec 13.
6
Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures.蛋白质结构比对方法的综合评估:基于几何度量的评分
J Mol Biol. 2005 Mar 4;346(4):1173-88. doi: 10.1016/j.jmb.2004.12.032. Epub 2005 Jan 16.
7
CDD: a Conserved Domain Database for protein classification.CDD:用于蛋白质分类的保守结构域数据库。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D192-6. doi: 10.1093/nar/gki069.
8
4SCOPmap: automated assignment of protein structures to evolutionary superfamilies.4SCOP图谱:蛋白质结构到进化超家族的自动分配
BMC Bioinformatics. 2004 Dec 14;5:197. doi: 10.1186/1471-2105-5-197.
9
UCSF Chimera--a visualization system for exploratory research and analysis.加州大学旧金山分校奇美拉——一个用于探索性研究与分析的可视化系统。
J Comput Chem. 2004 Oct;25(13):1605-12. doi: 10.1002/jcc.20084.
10
The ASTRAL Compendium in 2004.2004年的《星盘汇编》。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D189-92. doi: 10.1093/nar/gkh034.