Department of Computer Science, University College London, London WC1E 6BT, UK.
Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.
Science. 2024 Nov;386(6721):eadq4946. doi: 10.1126/science.adq4946. Epub 2024 Nov 1.
The AlphaFold Protein Structure Database (AFDB) contains more than 214 million predicted protein structures composed of domains, which are independently folding units found in multiple structural and functional contexts. Identifying domains can enable many functional and evolutionary analyses but has remained challenging because of the sheer scale of the data. Using deep learning methods, we have detected and classified every domain in the AFDB, producing The Encyclopedia of Domains. We detected nearly 365 million domains, over 100 million more than can be found by sequence methods, covering more than 1 million taxa. Reassuringly, 77% of the nonredundant domains are similar to known superfamilies, greatly expanding representation of their domain space. We uncovered more than 10,000 new structural interactions between superfamilies and thousands of new folds across the fold space continuum.
AlphaFold 蛋白质结构数据库 (AFDB) 包含超过 2.14 亿个由结构域组成的预测蛋白质结构,这些结构域是在多种结构和功能背景下独立折叠的单元。鉴定结构域可以实现许多功能和进化分析,但由于数据规模庞大,一直具有挑战性。我们使用深度学习方法,在 AFDB 中检测和分类了每一个结构域,生成了结构域百科全书。我们检测到近 3.65 亿个结构域,比序列方法多 1 亿多个,涵盖了超过 100 万个分类单元。令人放心的是,77%的非冗余结构域与已知的超家族相似,极大地扩展了它们的结构域空间的代表性。我们发现了超家族之间 10000 多个新的结构相互作用和折叠空间连续体上千个新的折叠。