Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.
Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Bioinformatics. 2019 May 1;35(9):1518-1526. doi: 10.1093/bioinformatics/bty828.
Decreasing costs are making it feasible to perform time series proteomics and genomics experiments with more replicates and higher resolution than ever before. With more replicates and time points, proteome and genome-wide patterns of expression are more readily discernible. These larger experiments require more batches exacerbating batch effects and increasing the number of bias trends. In the case of proteomics, where methods frequently result in missing data this increasing scale is also decreasing the number of peptides observed in all samples. The sources of batch effects and missing data are incompletely understood necessitating novel techniques.
Here we show that by exploiting the structure of time series experiments, it is possible to accurately and reproducibly model and remove batch effects. We implement Learning and Imputation for Mass-spec Bias Reduction (LIMBR) software, which builds on previous block-based models of batch effects and includes features specific to time series and circadian studies. To aid in the analysis of time series proteomics experiments, which are often plagued with missing data points, we also integrate an imputation system. By building LIMBR for imputation and time series tailored bias modeling into one straightforward software package, we expect that the quality and ease of large-scale proteomics and genomics time series experiments will be significantly increased.
Python code and documentation is available for download at https://github.com/aleccrowell/LIMBR and LIMBR can be downloaded and installed with dependencies using 'pip install limbr'.
Supplementary data are available at Bioinformatics online.
成本的降低使得进行时间序列蛋白质组学和基因组学实验成为可能,这些实验的重复次数和分辨率都比以往任何时候都要高。有了更多的重复和时间点,蛋白质组和全基因组的表达模式就更容易识别。这些更大规模的实验需要更多的批次,从而加剧了批次效应,增加了偏倚趋势的数量。在蛋白质组学中,由于方法经常导致数据缺失,因此这种规模的增加也减少了所有样本中观察到的肽的数量。批次效应和数据缺失的来源尚未完全了解,这需要新的技术。
在这里,我们展示了通过利用时间序列实验的结构,可以准确地、可重复地模拟和去除批次效应。我们实现了学习和推断用于质谱偏倚减少的方法(Learning and Imputation for Mass-spec Bias Reduction,LIMBR)软件,该软件基于以前的基于块的批次效应模型,并包括针对时间序列和昼夜节律研究的特定功能。为了帮助分析通常存在大量数据缺失点的时间序列蛋白质组学实验,我们还集成了一个推断系统。通过将 LIMBR 用于推断和时间序列定制的偏倚建模构建到一个简单的软件包中,我们期望大规模蛋白质组学和基因组学时间序列实验的质量和易用性将得到显著提高。
Python 代码和文档可在 https://github.com/aleccrowell/LIMBR 上下载,并且可以使用 'pip install limbr' 下载并安装带有依赖项的 LIMBR。
补充数据可在 Bioinformatics 在线获取。