Sokolenko Stanislav, Aucoin Marc G
Department of Chemical Engineering, University of Waterloo, 200 University Avenue West, Waterloo, N2L 3G1, ON, Canada.
BMC Syst Biol. 2015 Sep 4;9:51. doi: 10.1186/s12918-015-0197-4.
The growing ubiquity of metabolomic techniques has facilitated high frequency time-course data collection for an increasing number of applications. While the concentration trends of individual metabolites can be modeled with common curve fitting techniques, a more accurate representation of the data needs to consider effects that act on more than one metabolite in a given sample. To this end, we present a simple algorithm that uses nonparametric smoothing carried out on all observed metabolites at once to identify and correct systematic error from dilution effects. In addition, we develop a simulation of metabolite concentration time-course trends to supplement available data and explore algorithm performance. Although we focus on nuclear magnetic resonance (NMR) analysis in the context of cell culture, a number of possible extensions are discussed.
Realistic metabolic data was successfully simulated using a 4-step process. Starting with a set of metabolite concentration time-courses from a metabolomic experiment, each time-course was classified as either increasing, decreasing, concave, or approximately constant. Trend shapes were simulated from generic functions corresponding to each classification. The resulting shapes were then scaled to simulated compound concentrations. Finally, the scaled trends were perturbed using a combination of random and systematic errors. To detect systematic errors, a nonparametric fit was applied to each trend and percent deviations calculated at every timepoint. Systematic errors could be identified at time-points where the median percent deviation exceeded a threshold value, determined by the choice of smoothing model and the number of observed trends. Regardless of model, increasing the number of observations over a time-course resulted in more accurate error estimates, although the improvement was not particularly large between 10 and 20 samples per trend. The presented algorithm was able to identify systematic errors as small as 2.5 % under a wide range of conditions.
Both the simulation framework and error correction method represent examples of time-course analysis that can be applied to further developments in (1)H-NMR methodology and the more general application of quantitative metabolomics.
代谢组学技术的日益普及促进了越来越多应用场景下高频时程数据的收集。虽然单个代谢物的浓度趋势可以用常见的曲线拟合技术进行建模,但要更准确地呈现数据,需要考虑在给定样本中对不止一种代谢物产生作用的影响因素。为此,我们提出一种简单算法,该算法对所有观测到的代谢物同时进行非参数平滑处理,以识别并校正稀释效应产生的系统误差。此外,我们开发了一种代谢物浓度时程趋势模拟方法,以补充现有数据并探索算法性能。尽管我们聚焦于细胞培养背景下的核磁共振(NMR)分析,但也讨论了一些可能的扩展应用。
通过一个四步流程成功模拟了逼真的代谢数据。从代谢组学实验的一组代谢物浓度时程数据开始,每个时程数据被分类为上升、下降、凹形或近似恒定。趋势形状由对应每种分类的通用函数模拟得出。然后将所得形状按比例缩放为模拟化合物浓度。最后,使用随机误差和系统误差的组合对缩放后的趋势进行扰动。为了检测系统误差,对每个趋势应用非参数拟合,并在每个时间点计算百分比偏差。当中位数百分比偏差超过由平滑模型选择和观测趋势数量确定的阈值时,即可在这些时间点识别出系统误差。无论采用何种模型,在一个时程内增加观测数量都会带来更准确的误差估计,不过在每个趋势有10到20个样本时,这种改进并不特别显著。所提出的算法在广泛的条件下能够识别低至2.5%的系统误差。
模拟框架和误差校正方法均为例证,展示了时程分析可应用于(1)H-NMR方法的进一步发展以及定量代谢组学更广泛的应用。