Suppr超能文献

RCSB 蛋白质数据库:通过架构上的改进,实现了对 PDB 结构的高效搜索和同时访问一百万计算结构模型的功能。

RCSB Protein Data Bank: Efficient Searching and Simultaneous Access to One Million Computed Structure Models Alongside the PDB Structures Enabled by Architectural Advances.

机构信息

Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA.

Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA.

出版信息

J Mol Biol. 2023 Jul 15;435(14):167994. doi: 10.1016/j.jmb.2023.167994. Epub 2023 Feb 2.

Abstract

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) provides open access to experimentally-determined three-dimensional (3D) structures of biomolecules. The RCSB PDB RCSB.org research-focused web portal is used annually by many millions of users around the world. They access biostructure information, run complex queries utilizing various search services (e.g., full-text, structural and chemical attribute, chemical, sequence, and structure similarity searches), and visualize macromolecules in 3D, all at no charge and with no limitations on data usage. Notwithstanding more than 24,000-fold growth of the PDB over the past five decades, experimentally-determined structures are only available for a small subset of the millions of proteins of known sequence. Recently developed machine learning software tools can predict 3D structures of proteins at accuracies comparable to lower-resolution experimental methods. The RCSB PDB now provides access to ∼1,000,000 Computed Structure Models (CSMs) of proteins coming from AlphaFold DB and the ModelArchive alongside ∼200,000 experimentally-determined PDB structures. Both CSMs and PDB structures are available on RCSB.org and via well-established RCSB PDB Data, Search, and 1D-Coordinates application programming interfaces (APIs). Simultaneous delivery of PDB data and CSMs provides users with access to complementary structural information across the human proteome and those of model organisms and selected pathogens. API enhancements are backwards-compatible and programmatic users can "opt in" to access CSMs with minimal effort. Herein, we describe modifications to RCSB PDB cyberinfrastructure required to support sixfold scaling of 3D biostructure data delivery and lay the groundwork for scaling to accommodate hundreds of millions of CSMs.

摘要

研究合作结构生物信息学蛋白质数据库(RCSB PDB)提供了对生物分子实验确定的三维(3D)结构的开放访问。RCSB PDB 的 RCSB.org 以研究为重点的网络门户每年被全球数以百万计的用户使用。他们访问生物结构信息,利用各种搜索服务(例如全文、结构和化学属性、化学、序列和结构相似性搜索)运行复杂查询,并以 3D 形式可视化大分子,所有这些都是免费的,并且对数据使用没有限制。尽管在过去的五十年中,PDB 的规模增长了 24000 多倍,但实验确定的结构仅可用于已知序列的数百万种蛋白质中的一小部分。最近开发的机器学习软件工具可以以与低分辨率实验方法相当的精度预测蛋白质的 3D 结构。RCSB PDB 现在提供对来自 AlphaFold DB 和 ModelArchive 的约 100 万个蛋白质计算结构模型(CSM)的访问,以及约 20 万个实验确定的 PDB 结构。CSM 和 PDB 结构都可在 RCSB.org 上以及通过成熟的 RCSB PDB 数据、搜索和 1D-Coordinates 应用程序编程接口(API)获得。同时提供 PDB 数据和 CSM,可以使用户访问人类蛋白质组以及模型生物和选定病原体的互补结构信息。API 增强是向后兼容的,并且程序用户可以“选择加入”以最小的努力访问 CSM。本文描述了支持 3D 生物结构数据交付六倍扩展所需的 RCSB PDB 网络基础设施的修改,并为扩展以适应数亿个 CSM 奠定了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb9e/11514064/8e3563a7fe00/nihms-2030557-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验