• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于标记定量蛋白质组学实验的不完整数据的混合效应模型

A MIXED-EFFECTS MODEL FOR INCOMPLETE DATA FROM LABELING-BASED QUANTITATIVE PROTEOMICS EXPERIMENTS.

作者信息

Chen Lin S, Wang Jiebiao, Wang Xianlong, Wang Pei

机构信息

Department of Public Health Sciences, University of Chicago, 5841 S Maryland Ave, Chicago, Illinois, USA.

Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, Washington 98109, USA.

出版信息

Ann Appl Stat. 2017 Mar;11(1):114-138. doi: 10.1214/16-AOAS994. Epub 2017 Apr 8.

DOI:10.1214/16-AOAS994
PMID:29743963
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5937554/
Abstract

In mass spectrometry (MS) based quantitative proteomics research, the emerging iTRAQ (isobaric tag for relative and absolute quantitation) and TMT (tandem mass tags) techniques have been widely adopted for high throughput protein profiling. In a typical iTRAQ/TMT proteomics study, samples are grouped into batches, and each batch is processed by one multiplex experiment, in which the abundances of thousands of proteins/peptides in a batch of samples can be measured simultaneously. The multiplex labeling technique greatly enhances the throughput of protein quantification. However, the technical variation across different iTRAQ/TMT multiplex experiments is often large due to the dynamic nature of MS instruments. This leads to strong batch effects in the iTRAQ/TMT data. Moreover, the iTRAQ/TMT data often contain substantial batch-level nonignorable missing entries. Specifically, the abundance measures of a given protein/peptide are often either observed or missing altogether in all the samples from the same batch, with the missing probability depending on the combined batch-level abundances. We term this unique missing-data mechanism as the Batch-level Abundance-Dependent Missing-data Mechanism (BADMM). We introduce a new method- mixEMM-for analyzing iTRAQ/TMT data with batch effects and batch-level nonignorable missingness. The mixEMM method employs a linear mixed-effects model and explicitly models the batch effects and the BADMM. With simulation studies, we showed that, compared with existing approaches that utilize relative abundances and ignore the missing batches under the missing-completely-at-random assumption, the mixEMM method achieves more accurate parameter estimation and inference. We applied the method to an iTRAQ proteomics data from a breast cancer study and identified phosphopeptides differentially expressed between different breast cancer subtypes. The method can be applied to general clustered data with cluster-level nonignorable missing-data mechanisms.

摘要

在基于质谱(MS)的定量蛋白质组学研究中,新兴的iTRAQ(相对和绝对定量的等压标签)和TMT(串联质谱标签)技术已被广泛用于高通量蛋白质谱分析。在典型的iTRAQ/TMT蛋白质组学研究中,样本被分组为批次,每个批次通过一个多重实验进行处理,在该实验中,可以同时测量一批样本中数千种蛋白质/肽的丰度。多重标记技术大大提高了蛋白质定量的通量。然而,由于质谱仪器的动态特性,不同iTRAQ/TMT多重实验之间的技术差异往往很大。这导致iTRAQ/TMT数据中存在强烈的批次效应。此外,iTRAQ/TMT数据通常包含大量批次水平上不可忽略的缺失值。具体而言,给定蛋白质/肽的丰度测量值在同一批次的所有样本中通常要么被观测到,要么完全缺失,缺失概率取决于组合的批次水平丰度。我们将这种独特的缺失数据机制称为批次水平丰度依赖缺失数据机制(BADMM)。我们引入了一种新方法——mixEMM,用于分析具有批次效应和批次水平不可忽略缺失值的iTRAQ/TMT数据。mixEMM方法采用线性混合效应模型,并明确对批次效应和BADMM进行建模。通过模拟研究,我们表明,与在完全随机缺失假设下利用相对丰度并忽略缺失批次的现有方法相比,mixEMM方法实现了更准确的参数估计和推断。我们将该方法应用于一项乳腺癌研究的iTRAQ蛋白质组学数据,并鉴定了不同乳腺癌亚型之间差异表达的磷酸化肽。该方法可应用于具有聚类水平不可忽略缺失数据机制的一般聚类数据。

相似文献

1
A MIXED-EFFECTS MODEL FOR INCOMPLETE DATA FROM LABELING-BASED QUANTITATIVE PROTEOMICS EXPERIMENTS.基于标记定量蛋白质组学实验的不完整数据的混合效应模型
Ann Appl Stat. 2017 Mar;11(1):114-138. doi: 10.1214/16-AOAS994. Epub 2017 Apr 8.
2
SIMSI-Transfer: Software-Assisted Reduction of Missing Values in Phosphoproteomic and Proteomic Isobaric Labeling Data Using Tandem Mass Spectrum Clustering.SIMSI-Transfer:基于串联质谱聚类的磷酸化蛋白质组学和蛋白质组学等压标记数据缺失值的软件辅助减少。
Mol Cell Proteomics. 2022 Aug;21(8):100238. doi: 10.1016/j.mcpro.2022.100238. Epub 2022 Apr 21.
3
Using multivariate mixed-effects selection models for analyzing batch-processed proteomics data with non-ignorable missingness.利用多元混合效应选择模型分析具有不可忽略缺失值的批量处理蛋白质组学数据。
Biostatistics. 2019 Oct 1;20(4):648-665. doi: 10.1093/biostatistics/kxy022.
4
Quantitative Top-Down Proteomics in Complex Samples Using Protein-Level Tandem Mass Tag Labeling.使用基于蛋白质水平的串联质量标签标记的复杂样品的定量自上而下的蛋白质组学。
J Am Soc Mass Spectrom. 2021 Jun 2;32(6):1336-1344. doi: 10.1021/jasms.0c00464. Epub 2021 Mar 16.
5
Estimating influence of cofragmentation on peptide quantification and identification in iTRAQ experiments by simulating multiplexed spectra.通过模拟多重光谱估计共片段化对iTRAQ实验中肽段定量和鉴定的影响。
J Proteome Res. 2014 Jul 3;13(7):3488-97. doi: 10.1021/pr500060d. Epub 2014 Jun 23.
6
Relative Protein Quantification Using Tandem Mass Tag Mass Spectrometry.使用串联质谱标签质谱法进行相对蛋白质定量
Methods Mol Biol. 2017;1550:185-198. doi: 10.1007/978-1-4939-6747-6_14.
7
Multibatch TMT Reveals False Positives, Batch Effects and Missing Values.多批次 TMT 揭示了假阳性、批次效应和缺失值。
Mol Cell Proteomics. 2019 Oct;18(10):1967-1980. doi: 10.1074/mcp.RA119.001472. Epub 2019 Jul 22.
8
Improvement of Quantitative Measurements in Multiplex Proteomics Using High-Field Asymmetric Waveform Spectrometry.利用高场不对称波形光谱法改进多重蛋白质组学中的定量测量
J Proteome Res. 2016 Dec 2;15(12):4653-4665. doi: 10.1021/acs.jproteome.6b00745. Epub 2016 Oct 19.
9
Data Imputation in Merged Isobaric Labeling-Based Relative Quantification Datasets.基于等压标记的相对定量合并数据集中的数据插补
Methods Mol Biol. 2020;2051:297-308. doi: 10.1007/978-1-4939-9744-2_13.
10
Dissecting the iTRAQ Data Analysis.剖析iTRAQ数据分析。
Methods Mol Biol. 2016;1362:277-91. doi: 10.1007/978-1-4939-3106-4_18.

引用本文的文献

1
AUGMENTED DOUBLY ROBUST POST-IMPUTATION INFERENCE FOR PROTEOMIC DATA.蛋白质组学数据的增强双稳健插补后推断
bioRxiv. 2025 Jan 19:2024.03.23.586387. doi: 10.1101/2024.03.23.586387.
2
Identifying candidate genes and drug targets for Alzheimer's disease by an integrative network approach using genetic and brain region-specific proteomic data.采用整合网络方法,利用遗传和大脑区域特异性蛋白质组学数据,鉴定阿尔茨海默病的候选基因和药物靶点。
Hum Mol Genet. 2022 Sep 29;31(19):3341-3354. doi: 10.1093/hmg/ddac124.
3
Estimating and accounting for unobserved covariates in high-dimensional correlated data.估计和考虑高维相关数据中未观测到的协变量。
J Am Stat Assoc. 2022;117(537):225-236. doi: 10.1080/01621459.2020.1769635. Epub 2020 Jun 30.
4
Evaluation of Differential Peptide Loading on Tandem Mass Tag-Based Proteomic and Phosphoproteomic Data Quality.评价串联质量标签蛋白质组学和磷酸化蛋白质组学数据质量中差异肽负载。
J Am Soc Mass Spectrom. 2022 Jan 5;33(1):17-30. doi: 10.1021/jasms.1c00169. Epub 2021 Nov 23.
5
ESTIMATION AND INFERENCE IN METABOLOMICS WITH NON-RANDOM MISSING DATA AND LATENT FACTORS.具有非随机缺失数据和潜在因素的代谢组学中的估计与推断
Ann Appl Stat. 2020 Jun;14(2):789-808. doi: 10.1214/20-aoas1328. Epub 2020 Jun 29.
6
Assessment of TMT Labeling Efficiency in Large-Scale Quantitative Proteomics: The Critical Effect of Sample pH.大规模定量蛋白质组学中TMT标记效率的评估:样品pH值的关键影响
ACS Omega. 2021 May 6;6(19):12660-12666. doi: 10.1021/acsomega.1c00776. eCollection 2021 May 18.
7
Integrative Proteo-genomic Analysis to Construct CNA-protein Regulatory Map in Breast and Ovarian Tumors.整合蛋白质基因组分析构建乳腺癌和卵巢肿瘤的 CNA-蛋白调控图谱。
Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S66-S81. doi: 10.1074/mcp.RA118.001229. Epub 2019 Jul 7.
8
The Mount Sinai cohort of large-scale genomic, transcriptomic and proteomic data in Alzheimer's disease.西奈山队列的大规模基因组、转录组和蛋白质组学数据在阿尔茨海默病中的应用。
Sci Data. 2018 Sep 11;5:180185. doi: 10.1038/sdata.2018.185.
9
Using multivariate mixed-effects selection models for analyzing batch-processed proteomics data with non-ignorable missingness.利用多元混合效应选择模型分析具有不可忽略缺失值的批量处理蛋白质组学数据。
Biostatistics. 2019 Oct 1;20(4):648-665. doi: 10.1093/biostatistics/kxy022.

本文引用的文献

1
Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer.人类高级别浆液性卵巢癌的综合蛋白质基因组特征分析
Cell. 2016 Jul 28;166(3):755-765. doi: 10.1016/j.cell.2016.05.069. Epub 2016 Jun 29.
2
Proteogenomics connects somatic mutations to signalling in breast cancer.蛋白质基因组学将体细胞突变与乳腺癌中的信号传导联系起来。
Nature. 2016 Jun 2;534(7605):55-62. doi: 10.1038/nature18003. Epub 2016 May 25.
3
Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry.使用多重定量质谱进行无偏热蛋白质组谱分析,以鉴定直接和间接药物靶点。
Nat Protoc. 2015 Oct;10(10):1567-93. doi: 10.1038/nprot.2015.101. Epub 2015 Sep 17.
4
STRING v10: protein-protein interaction networks, integrated over the tree of life.STRING v10:整合了整个生命之树的蛋白质-蛋白质相互作用网络。
Nucleic Acids Res. 2015 Jan;43(Database issue):D447-52. doi: 10.1093/nar/gku1003. Epub 2014 Oct 28.
5
Isobaric labeling-based relative quantification in shotgun proteomics.鸟枪法蛋白质组学中基于等压标记的相对定量分析
J Proteome Res. 2014 Dec 5;13(12):5293-309. doi: 10.1021/pr500880b. Epub 2014 Nov 4.
6
Effects of MEK inhibitors GSK1120212 and PD0325901 in vivo using 10-plex quantitative proteomics and phosphoproteomics.使用10重定量蛋白质组学和磷酸化蛋白质组学研究MEK抑制剂GSK1120212和PD0325901在体内的作用。
Proteomics. 2015 Jan;15(2-3):462-73. doi: 10.1002/pmic.201400154. Epub 2014 Oct 18.
7
Proteogenomic characterization of human colon and rectal cancer.人类结肠癌和直肠癌的蛋白质基因组学特征分析
Nature. 2014 Sep 18;513(7518):382-7. doi: 10.1038/nature13438. Epub 2014 Jul 20.
8
MultiNotch MS3 enables accurate, sensitive, and multiplexed detection of differential expression across cancer cell line proteomes.MultiNotch MS3能够对癌细胞系蛋白质组中的差异表达进行准确、灵敏且多重的检测。
Anal Chem. 2014 Jul 15;86(14):7150-8. doi: 10.1021/ac502040v. Epub 2014 Jul 3.
9
Ion coalescence of neutron encoded TMT 10-plex reporter ions.中子编码 TMT 10 重试剂离子的离子聚结。
Anal Chem. 2014 Apr 1;86(7):3594-601. doi: 10.1021/ac500140s. Epub 2014 Mar 11.
10
A penalized EM algorithm incorporating missing data mechanism for Gaussian parameter estimation.一种用于高斯参数估计的结合缺失数据机制的惩罚期望最大化算法。
Biometrics. 2014 Jun;70(2):312-22. doi: 10.1111/biom.12149. Epub 2014 Jan 28.