Proteomics Center, Children's Hospital Boston, Boston, MA, USA.
Bioinformatics. 2010 Mar 15;26(6):791-7. doi: 10.1093/bioinformatics/btq036. Epub 2010 Feb 4.
Mass spectrometry (MS) has become the method of choice for protein/peptide sequence and modification analysis. The technology employs a two-step approach: ionized peptide precursor masses are detected, selected for fragmentation, and the fragment mass spectra are collected for computational analysis. Current precursor selection schemes are based on data- or information-dependent acquisition (DDA/IDA), where fragmentation mass candidates are selected by intensity and are subsequently included in a dynamic exclusion list to avoid constant refragmentation of highly abundant species. DDA/IDA methods do not exploit valuable information that is contained in the fractional mass of high-accuracy precursor mass measurements delivered by current instrumentation.
We extend previous contributions that suggest that fractional mass information allows targeted fragmentation of analytes of interest. We introduce a non-linear Random Forest classification and a discrete mapping approach, which can be trained to discriminate among arbitrary fractional mass patterns for an arbitrary number of classes of analytes. These methods can be used to increase fragmentation efficiency for specific subsets of analytes or to select suitable fragmentation technologies on-the-fly. We show that theoretical generalization error estimates transfer into practical application, and that their quality depends on the accuracy of prior distribution estimate of the analyte classes. The methods are applied to two real-world proteomics datasets.
All software used in this study is available from http://software.steenlab.org/fmf
hanno.steen@childrens.harvard.edu
Supplementary data are available at Bioinformatics online.
质谱(MS)已成为蛋白质/肽序列和修饰分析的首选方法。该技术采用两步法:检测离子化肽前体质量,选择进行碎片化,收集碎片质谱进行计算分析。目前的前体选择方案基于数据或信息依赖获取(DDA/IDA),其中通过强度选择碎片化质量候选物,并随后将其包含在动态排除列表中以避免高度丰富物种的不断重碎片化。DDA/IDA 方法没有利用当前仪器提供的高精度前体质量测量中包含的有价值信息。
我们扩展了先前的贡献,表明分数质量信息允许对感兴趣的分析物进行靶向碎片化。我们引入了非线性随机森林分类和离散映射方法,这些方法可以经过训练以区分任意数量的分析物类别的任意分数质量模式。这些方法可用于提高特定分析物子集的碎片化效率,或实时选择合适的碎片化技术。我们表明,理论泛化误差估计可转化为实际应用,并且其质量取决于分析物类别的先验分布估计的准确性。该方法应用于两个真实的蛋白质组学数据集。
本研究中使用的所有软件均可从 http://software.steenlab.org/fmf 获得。
hanno.steen@childrens.harvard.edu
补充数据可在 Bioinformatics 在线获得。