Suppr超能文献

扩展研究数据存储库系统以支持对生物医学数据集的直接计算访问:增强Dataverse以支持大型数据集。

Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets.

作者信息

McKinney Bill, Meyer Peter A, Crosas Mercè, Sliz Piotr

机构信息

Department of Biochemistry and Molecular Pharmacology and SBGrid Initiative, Harvard Medical School, Boston, Massachusetts, and the Dataverse Project, Harvard University, Cambridge, Massachusetts.

出版信息

Ann N Y Acad Sci. 2017 Jan;1387(1):95-104. doi: 10.1111/nyas.13272. Epub 2016 Nov 10.

Abstract

Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension-functionality supporting preservation of file system structure within Dataverse-which is essential for both in-place computation and supporting non-HTTP data transfers.

摘要

获取实验性X射线衍射图像数据对于大分子模型的验证和再现非常重要,对于结构生物学处理方法的发展也是不可或缺的。为了响应结构生物学界不断变化的需求,我们最近建立了一个衍射数据发布系统,即结构生物学数据网格(SBDG,data.sbgrid.org),以保存支持科学出版物的原始实验数据集。通过SBDG发布的所有数据集均在公共领域奉献许可下向研究社区免费提供,其元数据符合DataCite Schema(schema.datacite.org)。一项概念验证研究证明了社区的兴趣和实用性。发布大型数据集是多个领域共同面临的挑战,SBDG已开始与哈佛大学定量社会科学研究所合作,将Dataverse(dataverse.org)开源数据存储库系统扩展到结构生物学数据集。需要进行一些扩展以支持结构生物学数据集的大小和元数据要求。在本文中,我们描述了这样一种扩展功能——支持在Dataverse中保留文件系统结构,这对于就地计算和支持非HTTP数据传输都至关重要。

相似文献

9
Advanced literature analysis in a Big Data world.大数据时代的高级文献分析。
Ann N Y Acad Sci. 2017 Jan;1387(1):25-33. doi: 10.1111/nyas.13270. Epub 2016 Nov 10.
10
A public database of macromolecular diffraction experiments.大分子衍射实验公共数据库。
Acta Crystallogr D Struct Biol. 2016 Nov 1;72(Pt 11):1181-1193. doi: 10.1107/S2059798316014716. Epub 2016 Oct 28.

本文引用的文献

7
Archiving raw crystallographic data.存档原始晶体学数据。
Acta Crystallogr D Biol Crystallogr. 2014 Oct;70(Pt 10):2500-1. doi: 10.1107/S139900471402118X. Epub 2014 Sep 30.
9
The Protein Data Bank archive as an open data resource.蛋白质数据库存档作为一种开放数据资源。
J Comput Aided Mol Des. 2014 Oct;28(10):1009-14. doi: 10.1007/s10822-014-9770-y. Epub 2014 Jul 26.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验