利用 BioLit 将开放获取文献整合到 RCSB 蛋白质数据库中。

Integration of open access literature into the RCSB Protein Data Bank using BioLit.

机构信息

San Diego Supercomputer Center, University of California San Diego, 9500 Gilman Drive, Mailcode 0505 La Jolla, CA 92093-0505, USA.

出版信息

BMC Bioinformatics. 2010 Apr 29;11:220. doi: 10.1186/1471-2105-11-220.

DOI:10.1186/1471-2105-11-220

PMID:20429930

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2880030/

Abstract

BACKGROUND

Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now on-line and providing an increasing amount of open access content, often free of copyright restriction, this distinction between database and literature is blurring. To exploit this opportunity we present the integration of open access literature with the RCSB Protein Data Bank (PDB).

RESULTS

BioLit provides an enhanced view of articles with markup of semantic data and links to biological databases, based on the content of the article. For example, words matching to existing biological ontologies are highlighted and database identifiers are linked to their database of origin. Among other functions, it identifies PDB IDs that are mentioned in the open access literature, by parsing the full text for all research articles in PubMed Central (PMC) and exposing the results as simple XML Web Services. Here, we integrate BioLit results with the RCSB PDB website by using these services to find PDB IDs that are mentioned in research articles and subsequently retrieving abstract, figures, and text excerpts for those articles. A new RCSB PDB literature view permits browsing through the figures and abstracts of the articles that mention a given structure. The BioLit Web Services that are providing the underlying data are publicly accessible. A client library is provided that supports querying these services (Java).

CONCLUSIONS

The integration between literature and websites, as demonstrated here with the RCSB PDB, provides a broader view for how a given structure has been analyzed and used. This approach detects the mention of a PDB structure even if it is not formally cited in the paper. Other structures related through the same literature references can also be identified, possibly providing new scientific insight. To our knowledge this is the first time that database and literature have been integrated in this way and it speaks to the opportunities afforded by open and free access to both database and literature content.

摘要

背景

生物数据传统上通过各种在线数据库存储并公开，而生物知识则传统上存在于印刷文献中。随着期刊现在上线并提供越来越多的开放获取内容，通常不受版权限制，数据库和文献之间的这种区别正在变得模糊。为了利用这个机会，我们提出了将开放获取文献与 RCSB 蛋白质数据库 (PDB) 集成。

结果

BioLit 基于文章的内容，通过对语义数据进行标记和链接到生物数据库，提供了对文章的增强视图。例如，与现有生物本体匹配的单词会被突出显示，数据库标识符会链接到它们的原始数据库。除其他功能外，它通过解析 PubMed Central (PMC) 中所有研究文章的全文，识别文献中提到的 PDB ID，并将结果作为简单的 XML Web Services 公开。在这里，我们通过使用这些服务来查找文献中提到的 PDB ID，并随后检索这些文章的摘要、图像和文本摘录，将 BioLit 结果与 RCSB PDB 网站集成。新的 RCSB PDB 文献视图允许浏览提到给定结构的文章的图像和摘要。提供了支持查询这些服务的客户端库 (Java)。

结论

如这里与 RCSB PDB 展示的那样，文献和网站之间的集成提供了更广泛的视角，了解给定结构是如何被分析和使用的。这种方法即使在论文中没有正式引用，也可以检测到 PDB 结构的提及。还可以识别通过同一文献引用相关的其他结构，可能提供新的科学见解。据我们所知，这是首次以这种方式集成数据库和文献，这说明了开放和免费访问数据库和文献内容所带来的机会。

相似文献

Integration of open access literature into the RCSB Protein Data Bank using BioLit.利用 BioLit 将开放获取文献整合到 RCSB 蛋白质数据库中。

BMC Bioinformatics. 2010 Apr 29;11:220. doi: 10.1186/1471-2105-11-220.

BioLit: integrating biological literature with databases.生物文献整合：将生物学文献与数据库相结合。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W385-9. doi: 10.1093/nar/gkn317. Epub 2008 May 31.

The RCSB Protein Data Bank: redesigned web site and web services.RCSB蛋白质数据库：重新设计的网站和网络服务。

Nucleic Acids Res. 2011 Jan;39(Database issue):D392-401. doi: 10.1093/nar/gkq1021. Epub 2010 Oct 29.

RCSB Protein Data Bank: Efficient Searching and Simultaneous Access to One Million Computed Structure Models Alongside the PDB Structures Enabled by Architectural Advances.RCSB 蛋白质数据库：通过架构上的改进，实现了对 PDB 结构的高效搜索和同时访问一百万计算结构模型的功能。

J Mol Biol. 2023 Jul 15;435(14):167994. doi: 10.1016/j.jmb.2023.167994. Epub 2023 Feb 2.

RCSB Protein Data Bank: Celebrating 50 years of the PDB with new tools for understanding and visualizing biological macromolecules in 3D.RCSB 蛋白质数据库：用新工具庆祝 PDB 成立 50 周年，帮助理解和可视化 3D 生物大分子。

Protein Sci. 2022 Jan;31(1):187-208. doi: 10.1002/pro.4213. Epub 2021 Nov 6.

RCSB PDB Mobile: iOS and Android mobile apps to provide data access and visualization to the RCSB Protein Data Bank.RCSB PDB移动版：适用于iOS和安卓系统的移动应用程序，用于提供对RCSB蛋白质数据库的数据访问和可视化功能。

Bioinformatics. 2015 Jan 1;31(1):126-7. doi: 10.1093/bioinformatics/btu596. Epub 2014 Sep 2.

RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning.RCSB 蛋白质数据库（RCSB.org）：提供实验测定的 PDB 结构以及来自人工智能/机器学习的 100 万个蛋白质计算结构模型。

Nucleic Acids Res. 2023 Jan 6;51(D1):D488-D508. doi: 10.1093/nar/gkac1077.

Pre-calculated protein structure alignments at the RCSB PDB website.RCSB PDB 网站上预先计算的蛋白质结构比对。

Bioinformatics. 2010 Dec 1;26(23):2983-5. doi: 10.1093/bioinformatics/btq572. Epub 2010 Oct 10.

RCSB Protein Data Bank: Enabling biomedical research and drug discovery.RCSB 蛋白质数据库：推动生物医学研究和药物发现。

Protein Sci. 2020 Jan;29(1):52-65. doi: 10.1002/pro.3730. Epub 2019 Nov 29.

RCSB Protein Data Bank: supporting research and education worldwide through explorations of experimentally determined and computationally predicted atomic level 3D biostructures.RCSB蛋白质数据库：通过探索实验测定和计算预测的原子水平3D生物结构，支持全球的研究与教育。

IUCrJ. 2024 May 1;11(Pt 3):279-286. doi: 10.1107/S2052252524002604.

引用本文的文献

Citing a Data Repository: A Case Study of the Protein Data Bank.引用数据存储库：蛋白质数据库案例研究

PLoS One. 2015 Aug 28;10(8):e0136631. doi: 10.1371/journal.pone.0136631. eCollection 2015.

Adventures in data citation: sorghum genome data exemplifies the new gold standard.数据引用的探索：高粱基因组数据堪称新的黄金标准。

BMC Res Notes. 2012 Jul 2;5:223. doi: 10.1186/1756-0500-5-223.

Text mining for the biocuration workflow.文本挖掘在生物注释工作流中的应用。

Database (Oxford). 2012 Apr 18;2012:bas020. doi: 10.1093/database/bas020. Print 2012.

Quality assurance for the query and distribution systems of the RCSB Protein Data Bank.查询和分发系统的质量保证 RCSB 蛋白质数据库。

Database (Oxford). 2011 Mar 7;2011:bar003. doi: 10.1093/database/bar003. Print 2011.

The RCSB Protein Data Bank: redesigned web site and web services.RCSB蛋白质数据库：重新设计的网站和网络服务。

Nucleic Acids Res. 2011 Jan;39(Database issue):D392-401. doi: 10.1093/nar/gkq1021. Epub 2010 Oct 29.

本文引用的文献

Crystal structure of a novel Sm-like protein of putative cyanophage origin at 2.60 A resolution.分辨率为2.60埃的推定蓝藻噬菌体来源新型类Sm蛋白的晶体结构。

Proteins. 2009 May 1;75(2):296-307. doi: 10.1002/prot.22360.

Taxonomic distribution of large DNA viruses in the sea.海洋中大型DNA病毒的分类分布。

Genome Biol. 2008;9(7):R106. doi: 10.1186/gb-2008-9-7-r106. Epub 2008 Jul 3.

BioLit: integrating biological literature with databases.生物文献整合：将生物学文献与数据库相结合。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W385-9. doi: 10.1093/nar/gkn317. Epub 2008 May 31.

Open access: taking full advantage of the content.开放获取：充分利用内容。

PLoS Comput Biol. 2008 Mar 28;4(3):e1000037. doi: 10.1371/journal.pcbi.1000037.

Semantically linking and browsing PubMed abstracts with gene ontology.通过基因本体论对PubMed摘要进行语义链接和浏览。

BMC Genomics. 2008;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2164-9-S1-S10.

Linking entries in protein interaction database to structured text: the FEBS Letters experiment.将蛋白质相互作用数据库中的条目与结构化文本相链接：欧洲生物化学学会联合会快报实验

FEBS Lett. 2008 Apr 9;582(8):1171-7. doi: 10.1016/j.febslet.2008.02.071. Epub 2008 Mar 6.

Biocurators: contributors to the world of science.生物编目员：科学界的贡献者。

PLoS Comput Biol. 2006 Oct 27;2(10):e142. doi: 10.1371/journal.pcbi.0020142.

Enhancing the functional annotation of PDB structures in PDBsum using key figures extracted from the literature.利用从文献中提取的关键数据增强PDBsum中PDB结构的功能注释。

Bioinformatics. 2007 Jul 15;23(14):1824-7. doi: 10.1093/bioinformatics/btm085. Epub 2007 Mar 24.

Automatic document classification of biological literature.生物文献的自动文档分类

BMC Bioinformatics. 2006 Aug 7;7:370. doi: 10.1186/1471-2105-7-370.

Will a biological database be different from a biological journal?生物数据库会与生物学期刊有所不同吗？

PLoS Comput Biol. 2005 Aug;1(3):179-81. doi: 10.1371/journal.pcbi.0010034.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用 BioLit 将开放获取文献整合到 RCSB 蛋白质数据库中。

Integration of open access literature into the RCSB Protein Data Bank using BioLit.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献