• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用多元混合效应选择模型分析具有不可忽略缺失值的批量处理蛋白质组学数据。

Using multivariate mixed-effects selection models for analyzing batch-processed proteomics data with non-ignorable missingness.

机构信息

Department of Public Health Sciences, University of Chicago, 5841 S. Maryland Ave., Chicago, IL, USA.

Department of Genetics and Genomics Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 770 Lexington Avenue, New York, NY, USA.

出版信息

Biostatistics. 2019 Oct 1;20(4):648-665. doi: 10.1093/biostatistics/kxy022.

DOI:10.1093/biostatistics/kxy022
PMID:29939200
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6797056/
Abstract

In quantitative proteomics, mass tag labeling techniques have been widely adopted in mass spectrometry experiments. These techniques allow peptides (short amino acid sequences) and proteins from multiple samples of a batch being detected and quantified in a single experiment, and as such greatly improve the efficiency of protein profiling. However, the batch-processing of samples also results in severe batch effects and non-ignorable missing data occurring at the batch level. Motivated by the breast cancer proteomic data from the Clinical Proteomic Tumor Analysis Consortium, in this work, we developed two tailored multivariate MIxed-effects SElection models (mvMISE) to jointly analyze multiple correlated peptides/proteins in labeled proteomics data, considering the batch effects and the non-ignorable missingness. By taking a multivariate approach, we can borrow information across multiple peptides of the same protein or multiple proteins from the same biological pathway, and thus achieve better statistical efficiency and biological interpretation. These two different models account for different correlation structures among a group of peptides or proteins. Specifically, to model multiple peptides from the same protein, we employed a factor-analytic random effects structure to characterize the high and similar correlations among peptides. To model biological dependence among multiple proteins in a functional pathway, we introduced a graphical lasso penalty on the error precision matrix, and implemented an efficient algorithm based on the alternating direction method of multipliers. Simulations demonstrated the advantages of the proposed models. Applying the proposed methods to the motivating data set, we identified phosphoproteins and biological pathways that showed different activity patterns in triple negative breast tumors versus other breast tumors. The proposed methods can also be applied to other high-dimensional multivariate analyses based on clustered data with or without non-ignorable missingness.

摘要

在定量蛋白质组学中,质量标记标签技术已广泛应用于质谱实验中。这些技术允许在单个实验中同时检测和定量一批多个样本的肽(短氨基酸序列)和蛋白质,从而极大地提高了蛋白质谱分析的效率。然而,样本的批量处理也会导致批次效应和不可忽略的缺失数据在批次水平上发生。受临床蛋白质组肿瘤分析联盟的乳腺癌蛋白质组数据的启发,在这项工作中,我们开发了两种定制的多元混合效应选择模型(mvMISE),以联合分析标记蛋白质组学数据中的多个相关肽/蛋白质,同时考虑批次效应和不可忽略的缺失值。通过采用多元方法,我们可以在同一蛋白质的多个肽或同一生物学途径的多个蛋白质之间借用信息,从而实现更好的统计效率和生物学解释。这两种不同的模型考虑了一组肽或蛋白质之间不同的相关结构。具体来说,为了对来自同一蛋白质的多个肽建模,我们采用了因子分析随机效应结构来描述肽之间的高度相似相关性。为了对功能途径中的多个蛋白质之间的生物学依赖性建模,我们在误差精度矩阵上引入了图形套索惩罚,并基于交替方向乘子法实现了一种有效的算法。模拟结果证明了所提出模型的优势。将所提出的方法应用于激励数据集,我们鉴定了在三阴性乳腺癌与其他乳腺癌之间显示不同活性模式的磷酸化蛋白质和生物学途径。所提出的方法还可以应用于其他基于聚类数据的高维多元分析,无论是否存在不可忽略的缺失值。

相似文献

1
Using multivariate mixed-effects selection models for analyzing batch-processed proteomics data with non-ignorable missingness.利用多元混合效应选择模型分析具有不可忽略缺失值的批量处理蛋白质组学数据。
Biostatistics. 2019 Oct 1;20(4):648-665. doi: 10.1093/biostatistics/kxy022.
2
A MIXED-EFFECTS MODEL FOR INCOMPLETE DATA FROM LABELING-BASED QUANTITATIVE PROTEOMICS EXPERIMENTS.基于标记定量蛋白质组学实验的不完整数据的混合效应模型
Ann Appl Stat. 2017 Mar;11(1):114-138. doi: 10.1214/16-AOAS994. Epub 2017 Apr 8.
3
Integrative Proteo-genomic Analysis to Construct CNA-protein Regulatory Map in Breast and Ovarian Tumors.整合蛋白质基因组分析构建乳腺癌和卵巢肿瘤的 CNA-蛋白调控图谱。
Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S66-S81. doi: 10.1074/mcp.RA118.001229. Epub 2019 Jul 7.
4
Flexible modeling of multiple nonlinear longitudinal trajectories with censored and non-ignorable missing outcomes.具有删失和不可忽略的缺失结果的多个非线性纵向轨迹的灵活建模。
Stat Methods Med Res. 2023 Mar;32(3):593-608. doi: 10.1177/09622802221146312. Epub 2023 Jan 9.
5
Random effects and latent processes approaches for analyzing binary longitudinal data with missingness: a comparison of approaches using opiate clinical trial data.用于分析存在缺失值的二元纵向数据的随机效应和潜在过程方法:使用阿片类药物临床试验数据的方法比较
Stat Methods Med Res. 2007 Oct;16(5):417-39. doi: 10.1177/0962280206075308. Epub 2007 Jul 26.
6
A penalized EM algorithm incorporating missing data mechanism for Gaussian parameter estimation.一种用于高斯参数估计的结合缺失数据机制的惩罚期望最大化算法。
Biometrics. 2014 Jun;70(2):312-22. doi: 10.1111/biom.12149. Epub 2014 Jan 28.
7
Longitudinal data analysis with non-ignorable missing data.具有不可忽略缺失数据的纵向数据分析。
Stat Methods Med Res. 2016 Feb;25(1):205-20. doi: 10.1177/0962280212448721. Epub 2012 May 24.
8
A Two-Step Approach for Analysis of Nonignorable Missing Outcomes in Longitudinal Regression: an Application to Upstate KIDS Study.纵向回归中不可忽视的缺失结局分析的两步法:应用于纽约州北部儿童研究
Paediatr Perinat Epidemiol. 2017 Sep;31(5):468-478. doi: 10.1111/ppe.12382. Epub 2017 Aug 2.
9
PEPA test: fast and powerful differential analysis from relative quantitative proteomics data using shared peptides.PEPA 测试:利用共享肽进行相对定量蛋白质组学数据的快速、强大的差异分析。
Biostatistics. 2019 Oct 1;20(4):632-647. doi: 10.1093/biostatistics/kxy021.
10
OptiMissP: A dashboard to assess missingness in proteomic data-independent acquisition mass spectrometry.OptiMissP:一种用于评估蛋白质组学数据非依赖采集质谱中缺失数据的仪表盘。
PLoS One. 2021 Apr 15;16(4):e0249771. doi: 10.1371/journal.pone.0249771. eCollection 2021.

引用本文的文献

1
ESTIMATION AND INFERENCE IN METABOLOMICS WITH NON-RANDOM MISSING DATA AND LATENT FACTORS.具有非随机缺失数据和潜在因素的代谢组学中的估计与推断
Ann Appl Stat. 2020 Jun;14(2):789-808. doi: 10.1214/20-aoas1328. Epub 2020 Jun 29.
2
A robust two-sample transcriptome-wide Mendelian randomization method integrating GWAS with multi-tissue eQTL summary statistics.一种稳健的两样本转录组全基因组 Mendelian 随机化方法,将 GWAS 与多组织 eQTL 汇总统计数据相结合。
Genet Epidemiol. 2021 Jun;45(4):353-371. doi: 10.1002/gepi.22380. Epub 2021 Apr 9.

本文引用的文献

1
A MIXED-EFFECTS MODEL FOR INCOMPLETE DATA FROM LABELING-BASED QUANTITATIVE PROTEOMICS EXPERIMENTS.基于标记定量蛋白质组学实验的不完整数据的混合效应模型
Ann Appl Stat. 2017 Mar;11(1):114-138. doi: 10.1214/16-AOAS994. Epub 2017 Apr 8.
2
AHNAK suppresses tumour proliferation and invasion by targeting multiple pathways in triple-negative breast cancer.AHNAK通过靶向三阴性乳腺癌中的多种途径来抑制肿瘤增殖和侵袭。
J Exp Clin Cancer Res. 2017 May 12;36(1):65. doi: 10.1186/s13046-017-0522-4.
3
Proteogenomics connects somatic mutations to signalling in breast cancer.蛋白质基因组学将体细胞突变与乳腺癌中的信号传导联系起来。
Nature. 2016 Jun 2;534(7605):55-62. doi: 10.1038/nature18003. Epub 2016 May 25.
4
BAYESIAN SPARSE GRAPHICAL MODELS FOR CLASSIFICATION WITH APPLICATION TO PROTEIN EXPRESSION DATA.用于分类的贝叶斯稀疏图形模型及其在蛋白质表达数据中的应用
Ann Appl Stat. 2014;8(3):1443-1468. doi: 10.1214/14-AOAS722.
5
Rationale for targeting the Ras/MAPK pathway in triple-negative breast cancer.针对三阴性乳腺癌中Ras/丝裂原活化蛋白激酶(MAPK)信号通路的理论依据。
Discov Med. 2014 May;17(95):275-83.
6
The joint graphical lasso for inverse covariance estimation across multiple classes.用于跨多个类别的逆协方差估计的联合图形套索法。
J R Stat Soc Series B Stat Methodol. 2014 Mar;76(2):373-397. doi: 10.1111/rssb.12033.
7
Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium.将基因组改变与癌症生物学联系起来的蛋白质组学:NCI 临床蛋白质组肿瘤分析联盟。
Cancer Discov. 2013 Oct;3(10):1108-12. doi: 10.1158/2159-8290.CD-13-0219.
8
Addressing accuracy and precision issues in iTRAQ quantitation.解决 iTRAQ 定量分析中的准确性和精密度问题。
Mol Cell Proteomics. 2010 Sep;9(9):1885-97. doi: 10.1074/mcp.M900628-MCP200. Epub 2010 Apr 10.
9
Protein quantification in label-free LC-MS experiments.无标记 LC-MS 实验中的蛋白质定量。
J Proteome Res. 2009 Nov;8(11):5275-84. doi: 10.1021/pr900610q.
10
Protein labeling by iTRAQ: a new tool for quantitative mass spectrometry in proteome research.采用iTRAQ进行蛋白质标记:蛋白质组研究中定量质谱分析的新工具。
Proteomics. 2007 Feb;7(3):340-50. doi: 10.1002/pmic.200600422.