Kachala Michael, Westbrook John, Svergun Dmitri
Hamburg Outstation, European Molecular Biology Laboratory, Notkestrasse 85, Hamburg 22607, Germany.
RCSB PDB, Department of Chemistry and Chemical Biology and Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.
J Appl Crystallogr. 2016 Feb 1;49(Pt 1):302-310. doi: 10.1107/S1600576715024942.
Recent advances in small-angle scattering (SAS) experimental facilities and data analysis methods have prompted a dramatic increase in the number of users and of projects conducted, causing an upsurge in the number of objects studied, experimental data available and structural models generated. To organize the data and models and make them accessible to the community, the Task Forces on SAS and hybrid methods for the International Union of Crystallography and the Worldwide Protein Data Bank envisage developing a federated approach to SAS data and model archiving. Within the framework of this approach, the existing databases may exchange information and provide independent but synchronized entries to users. At present, ways of exchanging information between the various SAS databases are not established, leading to possible duplication and incompatibility of entries, and limiting the opportunities for data-driven research for SAS users. In this work, a solution is developed to resolve these issues and provide a universal exchange format for the community, based on the use of the widely adopted crystallographic information framework (CIF). The previous version of the sasCIF format, implemented as an extension of the core CIF dictionary, has been available since 2000 to facilitate SAS data exchange between laboratories. The sasCIF format has now been extended to describe comprehensively the necessary experimental information, results and models, including relevant metadata for SAS data analysis and for deposition into a database. Processing tools for these files () have been developed, and these are available both as standalone open-source programs and integrated into the SAS Biological Data Bank, allowing the export and import of data entries as sasCIF files. Software modules to save the relevant information directly from beamline data-processing pipelines in sasCIF format are also developed. This update of sasCIF and the relevant tools are an important step in the standardization of the way SAS data are presented and exchanged, to make the results easily accessible to users and to promote further the application of SAS in the structural biology community.
小角散射(SAS)实验设施和数据分析方法的最新进展促使用户数量和开展的项目数量急剧增加,导致所研究对象的数量、可用实验数据的数量以及生成的结构模型数量大幅上升。为了整理数据和模型并使其可供科学界使用,国际晶体学联盟和全球蛋白质数据银行的SAS及混合方法特别工作组设想开发一种联邦式方法来进行SAS数据和模型存档。在这种方法的框架内,现有数据库可以交换信息并向用户提供独立但同步的条目。目前,各种SAS数据库之间尚未建立信息交换方式,这可能导致条目的重复和不兼容,并限制了SAS用户进行数据驱动研究的机会。在这项工作中,基于广泛采用的晶体学信息框架(CIF),开发了一种解决方案来解决这些问题并为科学界提供一种通用的交换格式。sasCIF格式的先前版本作为核心CIF字典的扩展自2000年起可用,以促进实验室之间的SAS数据交换。sasCIF格式现已扩展,以全面描述必要的实验信息、结果和模型,包括用于SAS数据分析和存入数据库的相关元数据。已经开发了这些文件的处理工具(),这些工具既可以作为独立的开源程序使用,也可以集成到SAS生物数据库中,允许以sasCIF文件的形式导出和导入数据条目。还开发了软件模块,可直接从光束线数据处理管道以sasCIF格式保存相关信息。sasCIF的此次更新及相关工具是SAS数据呈现和交换方式标准化的重要一步,以使结果便于用户获取,并进一步促进SAS在结构生物学界的应用。