Department of Genetics, Stanford University, Stanford, CA 94305, USA.
Bioinformatics. 2021 May 5;37(6):815-821. doi: 10.1093/bioinformatics/btaa904.
Data normalization is an important step in processing proteomics data generated in mass spectrometry experiments, which aims to reduce sample-level variation and facilitate comparisons of samples. Previously published methods for normalization primarily depend on the assumption that the distribution of protein expression is similar across all samples. However, this assumption fails when the protein expression data is generated from heterogenous samples, such as from various tissue types. This led us to develop a novel data-driven method for improved normalization to correct the systematic bias meanwhile maintaining underlying biological heterogeneity.
To robustly correct the systematic bias, we used the density-power-weight method to down-weigh outliers and extended the one-dimensional robust fitting method described in the previous work to our structured data. We then constructed a robustness criterion and developed a new normalization algorithm, called RobNorm.In simulation studies and analysis of real data from the genotype-tissue expression project, we compared and evaluated the performance of RobNorm against other normalization methods. We found that the RobNorm approach exhibits the greatest reduction in systematic bias while maintaining across-tissue variation, especially for datasets from highly heterogeneous samples.
https://github.com/mwgrassgreen/RobNorm.
Supplementary data are available at Bioinformatics online.
数据标准化是处理质谱实验产生的蛋白质组学数据的重要步骤,旨在减少样本水平的差异,便于比较样本。以前发表的归一化方法主要依赖于蛋白质表达分布在所有样本中相似的假设。然而,当蛋白质表达数据来自异质样本(如不同组织类型)时,这种假设就会失效。这促使我们开发了一种新的数据驱动方法,以改进归一化,在纠正系统偏差的同时保持潜在的生物学异质性。
为了稳健地纠正系统偏差,我们使用密度-幂权法来减轻异常值的影响,并将之前工作中描述的一维稳健拟合方法扩展到我们的结构化数据中。然后,我们构建了一个稳健性标准,并开发了一种新的归一化算法,称为 RobNorm。在模拟研究和基因型-组织表达项目的真实数据分析中,我们比较和评估了 RobNorm 与其他归一化方法的性能。我们发现,RobNorm 方法在保持组织间变异性的同时,系统偏差的减少最大,特别是对于来自高度异质样本的数据集。
https://github.com/mwgrassgreen/RobNorm。
补充数据可在生物信息学在线获得。