统计和计算方法在蛋白质基因组数据分析中的应用。

Statistical and Computational Methods for Proteogenomic Data Analysis.

机构信息

Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

出版信息

Methods Mol Biol. 2023;2629:271-303. doi: 10.1007/978-1-0716-2986-4_13.

DOI:10.1007/978-1-0716-2986-4_13

PMID:36929082

Abstract

Proteins are the functional molecules for almost all cellular and biological processes. They are also the targets of most drugs. Proteins employ complex, multilevel regulations, so their abundance levels do not well correlated with their mRNA expression levels. The structure, activity, and functional roles of proteins are affected by posttranslational modifications (PTM), which are even less correlated with mRNA expression levels than protein abundances. Comprehensive characterization of the proteomics data is critical for understanding the molecular and cellular mechanisms of biological systems and developing news therapeutics. Current large-scale proteomic profiling technologies, such as mass spectrometry, provide relative identification of peptides and proteins, with data vulnerable to outliers, batch effects, and nonrandom missingness. In order to perform high-quality proteomic data analysis, we will first introduce a data preprocessing and quality control pipeline that includes normalization, outlier detection and removal, batch effect identification and handling, and missing data imputation. Then, we will describe several statistical methods that leverage well-processed proteomic data to generate scientific discoveries, especially with an integration with genomics and transcriptomics. These methods cover topics like association analysis, network construction, clustering, and cell-type deconvolution. To demonstrate these methods, we will use the proteogenomic data from the lung squamous cell carcinoma study of the Clinical Proteomic Tumor Analysis Consortium and provide sample codes for data access and analyses.

摘要

蛋白质是几乎所有细胞和生物过程的功能分子。它们也是大多数药物的靶点。蛋白质采用复杂的多层次调节方式，因此它们的丰度水平与 mRNA 表达水平并不完全相关。蛋白质的结构、活性和功能作用受到翻译后修饰（PTM）的影响，与蛋白质丰度相比，其与 mRNA 表达水平的相关性更差。全面描述蛋白质组学数据对于理解生物系统的分子和细胞机制以及开发新的治疗方法至关重要。目前的大规模蛋白质组学分析技术，如质谱法，提供了肽和蛋白质的相对鉴定，这些数据容易受到异常值、批次效应和非随机缺失的影响。为了进行高质量的蛋白质组学数据分析，我们将首先介绍一个数据预处理和质量控制流程，其中包括标准化、异常值检测和去除、批次效应识别和处理以及缺失数据插补。然后，我们将描述几种利用经过良好处理的蛋白质组学数据生成科学发现的统计方法，特别是与基因组学和转录组学的整合。这些方法涵盖了关联分析、网络构建、聚类和细胞类型去卷积等主题。为了演示这些方法，我们将使用临床蛋白质组肿瘤分析联盟的肺鳞状细胞癌研究中的蛋白质基因组学数据，并提供用于数据访问和分析的示例代码。

相似文献

Statistical and Computational Methods for Proteogenomic Data Analysis.统计和计算方法在蛋白质基因组数据分析中的应用。

Methods Mol Biol. 2023;2629:271-303. doi: 10.1007/978-1-0716-2986-4_13.

Proteogenomic interrogation of cancer cell lines: an overview of the field.肿瘤细胞系的蛋白质基因组学研究：该领域概述。

Expert Rev Proteomics. 2021 Mar;18(3):221-232. doi: 10.1080/14789450.2021.1914594. Epub 2021 Apr 20.

Proteomic variations of esophageal squamous cell carcinoma revealed by combining RNA-seq proteogenomics and G-PTM search strategy.通过整合RNA测序蛋白质基因组学和G-PTM搜索策略揭示食管鳞状细胞癌的蛋白质组变异

Heliyon. 2020 Aug 29;6(8):e04813. doi: 10.1016/j.heliyon.2020.e04813. eCollection 2020 Aug.

Current status of clinical proteogenomics in lung cancer.肺癌临床蛋白质基因组学的现状。

Expert Rev Proteomics. 2019 Sep;16(9):761-772. doi: 10.1080/14789450.2019.1654861. Epub 2019 Aug 21.

Identification of new protein coding sequences and signal peptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics.通过蛋白质组学鉴定幽门螺杆菌 26695 株的新蛋白编码序列和信号肽切割位点。

J Proteomics. 2013 Jun 28;86:27-42. doi: 10.1016/j.jprot.2013.04.036. Epub 2013 May 9.

Advanced Proteogenomic Analysis Reveals Multiple Peptide Mutations and Complex Immunoglobulin Peptides in Colon Cancer.先进的蛋白质基因组分析揭示了结肠癌中的多个肽突变和复杂免疫球蛋白肽。

J Proteome Res. 2015 Sep 4;14(9):3555-67. doi: 10.1021/acs.jproteome.5b00264. Epub 2015 Jul 21.

Proteogenomics: concepts, applications and computational strategies.蛋白质基因组学：概念、应用及计算策略

Nat Methods. 2014 Nov;11(11):1114-25. doi: 10.1038/nmeth.3144.

Proteogenomic data and resources for pan-cancer analysis.泛癌分析的蛋白质基因组学数据和资源。

Cancer Cell. 2023 Aug 14;41(8):1397-1406. doi: 10.1016/j.ccell.2023.06.009.

PGx: Putting Peptides to BED.药物基因组学：让肽发挥作用。

J Proteome Res. 2016 Mar 4;15(3):795-9. doi: 10.1021/acs.jproteome.5b00870. Epub 2015 Dec 18.

Integrative Proteo-genomic Analysis to Construct CNA-protein Regulatory Map in Breast and Ovarian Tumors.整合蛋白质基因组分析构建乳腺癌和卵巢肿瘤的 CNA-蛋白调控图谱。

Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S66-S81. doi: 10.1074/mcp.RA118.001229. Epub 2019 Jul 7.

本文引用的文献

A global map of associations between types of protein posttranslational modifications and human genetic diseases.蛋白质翻译后修饰类型与人类遗传疾病之间关联的全球图谱。

iScience. 2021 Jul 30;24(8):102917. doi: 10.1016/j.isci.2021.102917. eCollection 2021 Aug 20.

A reference-free approach for cell type classification with scRNA-seq.一种用于单细胞RNA测序细胞类型分类的无参考方法。

iScience. 2021 Jul 14;24(8):102855. doi: 10.1016/j.isci.2021.102855. eCollection 2021 Aug 20.

A proteogenomic portrait of lung squamous cell carcinoma.肺鳞状细胞癌的蛋白质基因组图谱。

Cell. 2021 Aug 5;184(16):4348-4371.e40. doi: 10.1016/j.cell.2021.07.016.

A combined approach for single-cell mRNA and intracellular protein expression analysis.单细胞 mRNA 和细胞内蛋白质表达分析的联合方法。

Commun Biol. 2021 May 25;4(1):624. doi: 10.1038/s42003-021-02142-w.

Proteogenomic and metabolomic characterization of human glioblastoma.人类脑胶质瘤的蛋白质基因组学和代谢组学特征分析。

Cancer Cell. 2021 Apr 12;39(4):509-528.e20. doi: 10.1016/j.ccell.2021.01.006. Epub 2021 Feb 11.

High-Throughput Large-Scale Targeted Proteomics Assays for Quantifying Pathway Proteins in KT2440.用于定量分析KT2440中通路蛋白的高通量大规模靶向蛋白质组学检测方法

Front Bioeng Biotechnol. 2020 Dec 2;8:603488. doi: 10.3389/fbioe.2020.603488. eCollection 2020.

Proteogenomic insights into the biology and treatment of HPV-negative head and neck squamous cell carcinoma.HPV 阴性头颈部鳞状细胞癌的蛋白质基因组学研究进展及其治疗策略。

Cancer Cell. 2021 Mar 8;39(3):361-379.e16. doi: 10.1016/j.ccell.2020.12.007. Epub 2021 Jan 7.

Integrated Proteogenomic Characterization across Major Histological Types of Pediatric Brain Cancer.小儿脑癌主要组织学类型的综合蛋白质基因组学特征分析

Cell. 2020 Dec 23;183(7):1962-1985.e31. doi: 10.1016/j.cell.2020.10.044. Epub 2020 Nov 25.

Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma.基于基因组与蛋白质组联合分析的肺腺癌治疗靶点研究

Cell. 2020 Jul 9;182(1):200-225.e35. doi: 10.1016/j.cell.2020.06.013.

Robust partial reference-free cell composition estimation from tissue expression.从组织表达中稳健的无参考局部细胞成分估计

Bioinformatics. 2020 Jun 1;36(11):3431-3438. doi: 10.1093/bioinformatics/btaa184.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

统计和计算方法在蛋白质基因组数据分析中的应用。

Statistical and Computational Methods for Proteogenomic Data Analysis.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献