算法与数据库。

Algorithms and databases.

作者信息

Martens Lennart, Apweiler Rolf

机构信息

EMBL Outstation-Hinxton, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

出版信息

Methods Mol Biol. 2009;564:245-59. doi: 10.1007/978-1-60761-157-8_14.

DOI:10.1007/978-1-60761-157-8_14

PMID:19544027

Abstract

The capacity of proteomics methods and mass spectrometry instrumentation to generate data has grown substantially over the past years. This data volume growth has in turn led to an increased reliance on software to identify peptide or protein sequences from the recorded mass spectra. Diverse algorithms can be applied for the processing of these data, each performing a specific task such as spectrum quality filtering, spectral clustering and merging, assigning a sequence to a spectrum, and assessing the validity of these assignments. The key algorithms to mass spectral processing pipelines are the ones that assign a sequence to a spectrum. The most commonly used variants of these are crucially dependent on the information contained in the sequences database, which they use as a basis for identification. Since these sequence databases are constructed in different ways and can therefore vary substantially in the amount and type of data they contain, they are also discussed here.

摘要

在过去几年中，蛋白质组学方法和质谱仪器生成数据的能力有了显著提高。这种数据量的增长反过来导致对软件的依赖增加，以便从记录的质谱中识别肽或蛋白质序列。可以应用各种算法来处理这些数据，每个算法执行特定任务，如光谱质量过滤、光谱聚类和合并、为光谱分配序列以及评估这些分配的有效性。质谱处理流程的关键算法是为光谱分配序列的算法。这些算法最常用的变体关键取决于序列数据库中包含的信息，它们以此作为识别的基础。由于这些序列数据库的构建方式不同，因此它们包含的数据量和类型可能有很大差异，本文也将对此进行讨论。