Pei Jimin, Schaeffer R Dustin, Cong Qian, Grishin Nick V
Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
Proteins. 2025 May 26. doi: 10.1002/prot.26840.
Homology-based protein domain classification is a powerful tool for gaining biological insights into protein function. This classification process has been significantly enhanced by the availability of experimental structures and high-accuracy structural models generated by advanced tools such as AlphaFold. Our Evolutionary Classification of protein Domains (ECOD) database provides a continuously updated and refined domain classification system. Isolated ("orphan") protein domain families, which have a limited distribution in the protein universe, present a unique challenge in this classification process. These families lack clear or identifiable evolutionary relationships with other sequence families. While some isolated domain families may have emerged through de novo evolution, others potentially share common evolutionary origins with existing domain families but represent difficult cases for traditional classification methods. In this study, we conducted a manual analysis of a set of isolated families of small domains in ECOD. By exploring sequence, structural, and functional evidence, we uncovered distant members and likely homologous relationships between different isolated domain families that were previously unrecognized. Our analysis provides valuable insights into the evolution of isolated domain families and has led to improved classification within ECOD. This work enhances our understanding of protein evolution and underscores the importance of continuous refinement in domain classification systems as new data and analytical methods become available.
基于同源性的蛋白质结构域分类是深入了解蛋白质功能的有力工具。实验结构以及由诸如AlphaFold等先进工具生成的高精度结构模型的可用性显著增强了这一分类过程。我们的蛋白质结构域进化分类(ECOD)数据库提供了一个不断更新和完善的结构域分类系统。孤立(“孤儿”)蛋白质结构域家族在蛋白质世界中分布有限,在这一分类过程中带来了独特的挑战。这些家族与其他序列家族缺乏明确或可识别的进化关系。虽然一些孤立的结构域家族可能是通过从头进化产生的,但其他一些可能与现有结构域家族有着共同的进化起源,但对于传统分类方法来说是困难的案例。在本研究中,我们对ECOD中一组小结构域的孤立家族进行了人工分析。通过探索序列、结构和功能证据,我们发现了不同孤立结构域家族之间以前未被识别的远缘成员和可能的同源关系。我们的分析为孤立结构域家族的进化提供了有价值的见解,并导致了ECOD内分类的改进。这项工作增进了我们对蛋白质进化的理解,并强调了随着新数据和分析方法的出现,在结构域分类系统中持续完善的重要性。