FrieslandCampina, Amersfoort, The Netherlands.
Anal Chim Acta. 2018 Oct 22;1028:1-10. doi: 10.1016/j.aca.2018.05.038. Epub 2018 May 15.
Efficient and reliable analysis of chemical analytical data is a great challenge due to the increase in data size, variety and velocity. New methodologies, approaches and methods are being proposed not only by chemometrics but also by other data scientific communities to extract relevant information from big datasets and provide their value to different applications. Besides common goal of big data analysis, different perspectives and terms on big data are being discussed in scientific literature and public media. The aim of this comprehensive review is to present common trends in the analysis of chemical analytical data across different data scientific fields together with their data type-specific and generic challenges. Firstly, common data science terms used in different data scientific fields are summarized and discussed. Secondly, systematic methodologies to plan and run big data analysis projects are presented together with their steps. Moreover, different analysis aspects like assessing data quality, selecting data pre-processing strategies, data visualization and model validation are considered in more detail. Finally, an overview of standard and new data analysis methods is provided and their suitability for big analytical chemical datasets shortly discussed.
由于数据规模、种类和速度的增加,高效可靠地分析化学分析数据是一项巨大的挑战。新的方法、方法和方法不仅由化学计量学提出,而且由其他数据科学界提出,以从大数据集中提取相关信息,并为不同的应用提供其价值。除了大数据分析的共同目标外,科学界和大众媒体也在讨论大数据的不同视角和术语。本综述的目的是展示不同数据科学领域中化学分析数据的共同趋势,以及它们的数据类型特定和通用的挑战。首先,总结和讨论了不同数据科学领域中常用的数据科学术语。其次,提出了规划和运行大数据分析项目的系统方法,并介绍了其步骤。此外,还更详细地考虑了不同的分析方面,如评估数据质量、选择数据预处理策略、数据可视化和模型验证。最后,提供了标准和新数据分析方法的概述,并简要讨论了它们对大型分析化学数据集的适用性。