Offroy Marc, Duponchel Ludovic
Laboratoire de Spectrochimie Infrarouge et Raman, LASIR, CNRS UMR 8516, Bât. C5, Université Lille 1, Sciences et Technologies, 59655, Villeneuve d'Ascq, Cedex, France.
Laboratoire de Spectrochimie Infrarouge et Raman, LASIR, CNRS UMR 8516, Bât. C5, Université Lille 1, Sciences et Technologies, 59655, Villeneuve d'Ascq, Cedex, France.
Anal Chim Acta. 2016 Mar 3;910:1-11. doi: 10.1016/j.aca.2015.12.037. Epub 2016 Jan 5.
An important feature of experimental science is that data of various kinds is being produced at an unprecedented rate. This is mainly due to the development of new instrumental concepts and experimental methodologies. It is also clear that the nature of acquired data is significantly different. Indeed in every areas of science, data take the form of always bigger tables, where all but a few of the columns (i.e. variables) turn out to be irrelevant to the questions of interest, and further that we do not necessary know which coordinates are the interesting ones. Big data in our lab of biology, analytical chemistry or physical chemistry is a future that might be closer than any of us suppose. It is in this sense that new tools have to be developed in order to explore and valorize such data sets. Topological data analysis (TDA) is one of these. It was developed recently by topologists who discovered that topological concept could be useful for data analysis. The main objective of this paper is to answer the question why topology is well suited for the analysis of big data set in many areas and even more efficient than conventional data analysis methods. Raman analysis of single bacteria should be providing a good opportunity to demonstrate the potential of TDA for the exploration of various spectroscopic data sets considering different experimental conditions (with high noise level, with/without spectral preprocessing, with wavelength shift, with different spectral resolution, with missing data).
实验科学的一个重要特征是各类数据正以前所未有的速度产生。这主要归功于新仪器概念和实验方法的发展。同样明显的是,所获取数据的性质有显著差异。事实上,在科学的各个领域,数据都呈现为越来越大的表格形式,其中除了少数几列(即变量)外,其余大部分与感兴趣的问题无关,而且我们不一定知道哪些坐标是感兴趣的。在我们的生物学、分析化学或物理化学实验室中,大数据可能比我们任何人想象的都来得更近。正是在这个意义上,必须开发新工具来探索和利用此类数据集。拓扑数据分析(TDA)就是其中之一。它是由拓扑学家最近开发的,他们发现拓扑概念可用于数据分析。本文的主要目的是回答为什么拓扑学非常适合分析许多领域的大数据集,甚至比传统数据分析方法更有效。考虑到不同的实验条件(高噪声水平、有无光谱预处理、有波长偏移、有不同光谱分辨率、有缺失数据),对单个细菌的拉曼分析应该能提供一个很好的机会来展示TDA在探索各种光谱数据集方面的潜力。