Cheng Hua, Liao Yuxing, Schaeffer R Dustin, Grishin Nick V
Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390.
Department of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, 75390.
Proteins. 2015 Jul;83(7):1238-51. doi: 10.1002/prot.24818. Epub 2015 May 8.
ECOD (Evolutionary Classification Of protein Domains) is a comprehensive and up-to-date protein structure classification database. The majority of new structures released from the PDB (Protein Data Bank) each week already have close homologs in the ECOD hierarchy and thus can be reliably partitioned into domains and classified by software without manual intervention. However, those proteins that lack confidently detectable homologs require careful analysis by experts. Although many bioinformatics resources rely on expert curation to some degree, specific examples of how this curation occurs and in what cases it is necessary are not always described. Here, we illustrate the manual classification strategy in ECOD by example, focusing on two major issues in protein classification: domain partitioning and the relationship between homology and similarity scores. Most examples show recently released and manually classified PDB structures. We discuss multi-domain proteins, discordance between sequence and structural similarities, difficulties with assessing homology with scores, and integral membrane proteins homologous to soluble proteins. By timely assimilation of newly available structures into its hierarchy, ECOD strives to provide a most accurate and updated view of the protein structure world as a result of combined computational and expert-driven analysis.
ECOD(蛋白质结构域进化分类数据库)是一个全面且最新的蛋白质结构分类数据库。每周从蛋白质数据库(PDB)发布的大多数新结构在ECOD层次结构中已经有密切的同源物,因此可以在无需人工干预的情况下通过软件可靠地划分为结构域并进行分类。然而,那些缺乏可置信检测到的同源物的蛋白质需要专家进行仔细分析。尽管许多生物信息学资源在一定程度上依赖专家编目,但这种编目如何进行以及在哪些情况下是必要的具体例子并不总是有描述。在这里,我们通过实例说明ECOD中的人工分类策略,重点关注蛋白质分类中的两个主要问题:结构域划分以及同源性与相似性得分之间的关系。大多数例子展示了最近发布并经过人工分类的PDB结构。我们讨论了多结构域蛋白质、序列与结构相似性之间的不一致、用得分评估同源性的困难以及与可溶性蛋白质同源的整合膜蛋白。通过及时将新获得的结构纳入其层次结构,ECOD努力通过计算分析和专家驱动分析相结合,提供蛋白质结构世界最准确和最新的视图。