Rocca-Serra Philippe, Salek Reza M, Arita Masanori, Correa Elon, Dayalan Saravanan, Gonzalez-Beltran Alejandra, Ebbels Tim, Goodacre Royston, Hastings Janna, Haug Kenneth, Koulman Albert, Nikolski Macha, Oresic Matej, Sansone Susanna-Assunta, Schober Daniel, Smith James, Steinbeck Christoph, Viant Mark R, Neumann Steffen
Oxford e-Research Centre, University of Oxford, 7 Keble Road, Oxford, OX1 3QG UK.
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK.
Metabolomics. 2016;12:14. doi: 10.1007/s11306-015-0879-3. Epub 2015 Nov 17.
Thousands of articles using metabolomics approaches are published every year. With the increasing amounts of data being produced, mere description of investigations as text in manuscripts is not sufficient to enable re-use anymore: the underlying data needs to be published together with the findings in the literature to maximise the benefit from public and private expenditure and to take advantage of an enormous opportunity to improve scientific reproducibility in metabolomics and cognate disciplines. Reporting recommendations in metabolomics started to emerge about a decade ago and were mostly concerned with inventories of the information that had to be reported in the literature for consistency. In recent years, metabolomics data standards have developed extensively, to include the primary research data, derived results and the experimental description and importantly the metadata in a machine-readable way. This includes vendor independent data standards such as mzML for mass spectrometry and nmrML for NMR raw data that have both enabled the development of advanced data processing algorithms by the scientific community. Standards such as ISA-Tab cover essential metadata, including the experimental design, the applied protocols, association between samples, data files and the experimental factors for further statistical analysis. Altogether, they pave the way for both reproducible research and data reuse, including meta-analyses. Further incentives to prepare standards compliant data sets include new opportunities to publish data sets, but also require a little "arm twisting" in the author guidelines of scientific journals to submit the data sets to public repositories such as the NIH Metabolomics Workbench or MetaboLights at EMBL-EBI. In the present article, we look at standards for data sharing, investigate their impact in metabolomics and give suggestions to improve their adoption.
每年都有成千上万篇使用代谢组学方法的文章发表。随着产生的数据量不断增加,仅在稿件中以文字形式描述研究已不足以实现数据的再利用:基础数据需要与文献中的研究结果一起发表,以最大限度地提高公共和私人支出的效益,并利用这一巨大机会提高代谢组学及相关学科的科学可重复性。大约十年前,代谢组学领域开始出现报告建议,主要关注为保持一致性而必须在文献中报告的信息清单。近年来,代谢组学数据标准得到了广泛发展,涵盖了原始研究数据、衍生结果、实验描述,重要的是还包括以机器可读方式呈现的元数据。这包括独立于供应商的数据标准,如用于质谱分析的mzML和用于核磁共振原始数据的nmrML,它们都推动了科学界先进数据处理算法的开发。诸如ISA-Tab之类的标准涵盖了基本的元数据,包括实验设计、应用的方案、样本、数据文件与实验因素之间的关联,以便进行进一步的统计分析。总体而言,它们为可重复研究和数据再利用(包括荟萃分析)铺平了道路。推动准备符合标准的数据集的其他因素包括发布数据集的新机会,但这也需要在科学期刊的作者指南中稍微“施加压力”,以便将数据集提交到公共存储库,如美国国立卫生研究院代谢组学工作台或欧洲生物信息研究所的MetaboLights。在本文中,我们将探讨数据共享标准,研究它们在代谢组学中的影响,并提出改进其采用情况的建议。