Helliwell John R
Department of Chemistry, University of Manchester, Manchester M13 9PL, United Kingdom.
Struct Dyn. 2019 Oct 25;6(5):054306. doi: 10.1063/1.5124439. eCollection 2019 Sep.
A publication is an important narrative of the work done and interpretations made by researchers securing a scientific discovery. As The Royal Society neatly states though, "Nullius in verba" ("Take nobody's word for it"), whereby the role of the underpinning data is paramount. Therefore, the objectivity that preserving that data within the article provides is due to readers being able to check the calculation decisions of the authors. But how to achieve full data archiving? This is the raw data archiving challenge, in size and need for correct metadata. Processed diffraction data and final derived molecular coordinates archiving in crystallography have achieved an exemplary state of the art relative to most fields. One can credit IUCr with developing exemplary peer review procedures, of narrative, underpinning structure factors and coordinate data and validation report, through its checkcif development and submission system introduced for Acta Cryst. C and subsequently developed for its other chemistry journals. The crystallographic databases likewise have achieved amazing success and sustainability these last 50 years or so. The wider science data scene is celebrating the FAIR data accord, namely, that data be Findable, Accessible, Interoperable, and Reusable [Wilkinson ., "Comment: The FAIR guiding principles for scientific data management and stewardship," Sci. Data , 160018 (2016)]. Some social scientists also emphasize more than FAIR being needed, the data should be "FACT," which is an acronym meaning Fair, Accurate, Confidential, and Transparent [van der Aalst ., "Responsible data science," Bus Inf. Syst. Eng. (5), 311-313 (2017)], this being the issue of ensuring reproducibility not just reusability. (Confidentiality of data not likely being relevant to our data obviously.) Acta Cryst. B, C, E, and IUCrData are the closest I know to being both FACT and FAIR where I repeat for due emphasis: the narrative, the automatic "general" validation checks, and the underpinning data are checked thoroughly by subject specialists (i.e., the specialist referees). IUCr Journals are also the best that I know of for encouraging and then expediting the citation of the DOI for a raw diffraction dataset in a publication; examples can be found in IUCrJ, Acta Cryst D, and Acta Cryst F. The wish for a checkcif for raw diffraction data has been championed by the IUCr Diffraction Data Deposition Working Group and its successor, the IUCr Committee on Data.
出版物是对获得科学发现的研究人员所做工作及解释的重要叙述。然而,正如英国皇家学会简洁表述的那样,“Nullius in verba”(“不要轻信任何人的话”),由此可见基础数据的作用至关重要。因此,在文章中保留这些数据所提供的客观性,是因为读者能够核查作者的计算决策。但如何实现完整的数据存档呢?这就是原始数据存档面临的挑战,包括数据量以及对正确元数据的需求。相对于大多数领域而言,晶体学中处理后的衍射数据和最终推导的分子坐标存档已达到堪称典范的先进水平。国际晶体学联盟(IUCr)通过为《晶体学报》C辑引入并随后为其其他化学期刊开发的checkcif开发与提交系统,在制定关于叙述、支撑结构因子、坐标数据及验证报告的典范同行评审程序方面值得称赞。在过去约50年里,晶体学数据库同样取得了惊人的成功并具备可持续性。更广泛的科学数据领域正在倡导“FAIR数据准则”,即数据应具备可查找、可访问、可互操作和可重用性[威尔金森等人,“评论:科学数据管理与 stewardship的FAIR指导原则”,《科学数据》,160018(2016)]。一些社会科学家还强调,除了FAIR准则外,数据还应“FACT”,这是一个首字母缩写词,意为公平、准确、保密和透明[范德阿尔斯特等人,“负责任的数据科学”,《商业信息系统工程》(5),311 - 313(2017)],这涉及确保可重复性而非仅仅是可重用性的问题。(显然,数据的保密性对我们的数据而言不太可能相关。)《晶体学报》B辑.C辑、E辑以及IUCrData是我所知最接近同时满足FACT和FAIR准则的,在此我再次着重强调:叙述内容以及自动的“通用”验证检查和基础数据都由专业领域专家(即专业审稿人)进行了全面核查。IUCr旗下的期刊也是我所知在鼓励并加快在出版物中引用原始衍射数据集的数字对象标识符(DOI)方面做得最好的;在《IUCrJ》《晶体学报》D辑和《晶体学报》F辑中都能找到相关示例。对原始衍射数据进行checkcif检查的愿望一直由IUCr衍射数据存档工作组及其继任者IUCr数据委员会倡导推动。