Podwojski Katharina, Fritsch Arno, Chamrad Daniel C, Paul Wolfgang, Sitek Barbara, Stühler Kai, Mutzel Petra, Stephan Christian, Meyer Helmut E, Urfer Wolfgang, Ickstadt Katja, Rahnenführer Jörg
Fakultät Statistik, Technische Universität Dortmund, 44221 Dortmund, Germany.
Bioinformatics. 2009 Mar 15;25(6):758-64. doi: 10.1093/bioinformatics/btp052. Epub 2009 Jan 28.
Proteomics has particularly evolved to become of high interest for the field of biomarker discovery and drug development. Especially the combination of liquid chromatography and mass spectrometry (LC/MS) has proven to be a powerful technique for analyzing protein mixtures. Clinically orientated proteomic studies will have to compare hundreds of LC/MS runs at a time. In order to compare different runs, sophisticated preprocessing steps have to be performed. An important step is the retention time (rt) alignment of LC/MS runs. Especially non-linear shifts in the rt between pairs of LC/MS runs make this a crucial and non-trivial problem.
For the purpose of demonstrating the particular importance of correcting non-linear rt shifts, we evaluate and compare different alignment algorithms. We present and analyze two versions of a new algorithm that is based on regression techniques, once assuming and estimating only linear shifts and once also allowing for the estimation of non-linear shifts. As an example for another type of alignment method we use an established alignment algorithm based on shifting vectors that we adapted to allow for correcting non-linear shifts also. In a simulation study, we show that rt alignment procedures that can estimate non-linear shifts yield clearly better alignments. This is even true under mild non-linear deviations.
R code for the regression-based alignment methods and simulated datasets are available at http://www.statistik.tu-dortmund.de/genetik-publikationen-alignment.html.
Supplementary data are available at Bioinformatics online.
蛋白质组学已经特别发展成为生物标志物发现和药物开发领域备受关注的领域。尤其是液相色谱和质谱联用(LC/MS)已被证明是分析蛋白质混合物的强大技术。以临床为导向的蛋白质组学研究一次必须比较数百次LC/MS运行结果。为了比较不同的运行结果,必须执行复杂的预处理步骤。一个重要步骤是LC/MS运行的保留时间(rt)校准。特别是LC/MS运行对之间rt的非线性偏移使得这成为一个关键且棘手的问题。
为了证明校正非线性rt偏移的特殊重要性,我们评估并比较了不同的校准算法。我们提出并分析了一种基于回归技术的新算法的两个版本,一个版本仅假设并估计线性偏移,另一个版本还允许估计非线性偏移。作为另一种校准方法的示例,我们使用一种基于移位向量的既定校准算法,并对其进行了调整以也能校正非线性偏移。在一项模拟研究中,我们表明能够估计非线性偏移的rt校准程序能产生明显更好的校准效果。即使在轻度非线性偏差情况下也是如此。
基于回归的校准方法的R代码和模拟数据集可在http://www.statistik.tu-dortmund.de/genetik-publikationen-alignment.html获取。
补充数据可在《生物信息学》在线获取。