Suppr超能文献

基于 MS 的蛋白质组学中常用数据分析策略的探讨。

Discussion on common data analysis strategies used in MS-based proteomics.

机构信息

Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal.

出版信息

Proteomics. 2011 Feb;11(4):604-19. doi: 10.1002/pmic.201000404. Epub 2011 Jan 17.

Abstract

Current proteomics technology is limited in resolving the proteome complexity of biological systems. The main issue at stake is to increase throughput and spectra quality so that spatiotemporal dimensions, population parameters and the complexity of protein modifications on a quantitative scale can be considered. MS-based proteomics and protein arrays are the main players in large-scale proteome analysis and an integration of these two methodologies is powerful but presently not sufficient for detailed quantitative and spatiotemporal proteome characterization. Improvements of instrumentation for MS-based proteomics have been achieved recently resulting in data sets of approximately one million spectra which is a large step in the right direction. The corresponding raw data range from 50 to 100 Gb and are frequently made available. Multidimensional LC-MS data sets have been demonstrated to identify and quantitate 2000-8000 proteins from whole cell extracts. The analysis of the resulting data sets requires several steps from raw data processing, to database-dependent search, statistical evaluation of the search result, quantitative algorithms and statistical analysis of quantitative data. A large number of software tools have been proposed for the above-mentioned tasks. However, it is not the aim of this review to cover all software tools, but rather discuss common data analysis strategies used by various algorithms for each of the above-mentioned steps in a non-redundant approach and to argue that there are still some areas which need improvements.

摘要

当前的蛋白质组学技术在解决生物系统的蛋白质组复杂性方面存在局限性。主要问题是要提高通量和光谱质量,以便能够考虑时空维度、群体参数以及蛋白质修饰的复杂性在定量尺度上。基于 MS 的蛋白质组学和蛋白质阵列是大规模蛋白质组分析的主要参与者,将这两种方法集成在一起非常强大,但目前还不足以进行详细的定量和时空蛋白质组特征描述。最近,基于 MS 的蛋白质组学仪器的改进已经取得了进展,产生了大约一百万张光谱的数据,这是朝着正确方向迈出的一大步。相应的原始数据范围从 50 到 100GB,并且经常可用。多维 LC-MS 数据集已被证明可从全细胞提取物中鉴定和定量 2000-8000 种蛋白质。分析由此产生的数据集需要从原始数据处理到数据库依赖搜索、搜索结果的统计评估、定量算法以及定量数据的统计分析的几个步骤。已经提出了大量软件工具来完成上述任务。然而,本评论的目的不是涵盖所有软件工具,而是以非冗余的方式讨论各种算法在上述每个步骤中使用的常见数据分析策略,并认为仍有一些领域需要改进。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验