Suppr超能文献

统计和计算方法在蛋白质基因组数据分析中的应用。

Statistical and Computational Methods for Proteogenomic Data Analysis.

机构信息

Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

出版信息

Methods Mol Biol. 2023;2629:271-303. doi: 10.1007/978-1-0716-2986-4_13.

Abstract

Proteins are the functional molecules for almost all cellular and biological processes. They are also the targets of most drugs. Proteins employ complex, multilevel regulations, so their abundance levels do not well correlated with their mRNA expression levels. The structure, activity, and functional roles of proteins are affected by posttranslational modifications (PTM), which are even less correlated with mRNA expression levels than protein abundances. Comprehensive characterization of the proteomics data is critical for understanding the molecular and cellular mechanisms of biological systems and developing news therapeutics. Current large-scale proteomic profiling technologies, such as mass spectrometry, provide relative identification of peptides and proteins, with data vulnerable to outliers, batch effects, and nonrandom missingness. In order to perform high-quality proteomic data analysis, we will first introduce a data preprocessing and quality control pipeline that includes normalization, outlier detection and removal, batch effect identification and handling, and missing data imputation. Then, we will describe several statistical methods that leverage well-processed proteomic data to generate scientific discoveries, especially with an integration with genomics and transcriptomics. These methods cover topics like association analysis, network construction, clustering, and cell-type deconvolution. To demonstrate these methods, we will use the proteogenomic data from the lung squamous cell carcinoma study of the Clinical Proteomic Tumor Analysis Consortium and provide sample codes for data access and analyses.

摘要

蛋白质是几乎所有细胞和生物过程的功能分子。它们也是大多数药物的靶点。蛋白质采用复杂的多层次调节方式,因此它们的丰度水平与 mRNA 表达水平并不完全相关。蛋白质的结构、活性和功能作用受到翻译后修饰(PTM)的影响,与蛋白质丰度相比,其与 mRNA 表达水平的相关性更差。全面描述蛋白质组学数据对于理解生物系统的分子和细胞机制以及开发新的治疗方法至关重要。目前的大规模蛋白质组学分析技术,如质谱法,提供了肽和蛋白质的相对鉴定,这些数据容易受到异常值、批次效应和非随机缺失的影响。为了进行高质量的蛋白质组学数据分析,我们将首先介绍一个数据预处理和质量控制流程,其中包括标准化、异常值检测和去除、批次效应识别和处理以及缺失数据插补。然后,我们将描述几种利用经过良好处理的蛋白质组学数据生成科学发现的统计方法,特别是与基因组学和转录组学的整合。这些方法涵盖了关联分析、网络构建、聚类和细胞类型去卷积等主题。为了演示这些方法,我们将使用临床蛋白质组肿瘤分析联盟的肺鳞状细胞癌研究中的蛋白质基因组学数据,并提供用于数据访问和分析的示例代码。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验