Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States.
Advanced Clinical Biosystems Research Institute, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States.
J Proteome Res. 2023 Feb 3;22(2):471-481. doi: 10.1021/acs.jproteome.2c00671. Epub 2023 Jan 25.
Recent surges in large-scale mass spectrometry (MS)-based proteomics studies demand a concurrent rise in methods to facilitate reliable and reproducible data analysis. Quantification of proteins in MS analysis can be affected by variations in technical factors such as sample preparation and data acquisition conditions leading to batch effects, which adds to noise in the data set. This may in turn affect the effectiveness of any biological conclusions derived from the data. Here we present Batch-effect Identification, Representation, and Correction of Heterogeneous data (BIRCH), a workflow for analysis and correction of batch effect through an automated, versatile, and easy to use web-based tool with the goal of eliminating technical variation. BIRCH also supports diagnosis of the data to check for the presence of batch effects, feasibility of batch correction, and imputation to deal with missing values in the data set. To illustrate the relevance of the tool, we explore two case studies, including an iPSC-derived cell study and a Covid vaccine study to show different context-specific use cases. Ultimately this tool can be used as an extremely powerful approach for eliminating technical bias while retaining biological bias, toward understanding disease mechanisms and potential therapeutics.
最近大规模基于质谱(MS)的蛋白质组学研究的激增要求同时提高方法的可靠性和可重复性数据分析。MS 分析中蛋白质的定量可能会受到样品制备和数据采集条件等技术因素变化的影响,从而导致批次效应,这会增加数据集的噪声。这反过来又可能影响从数据中得出的任何生物学结论的有效性。在这里,我们提出了 Batch-effect Identification, Representation, and Correction of Heterogeneous data (BIRCH),这是一种通过自动化、通用且易于使用的基于网络的工具进行分析和校正批次效应的工作流程,目的是消除技术变化。BIRCH 还支持对数据进行诊断,以检查是否存在批次效应、批次校正的可行性以及对数据集缺失值的插补。为了说明该工具的相关性,我们探讨了两个案例研究,包括 iPSC 衍生细胞研究和新冠疫苗研究,以展示不同特定于上下文的用例。最终,该工具可用作消除技术偏差同时保留生物学偏差的极其强大的方法,以了解疾病机制和潜在的治疗方法。