Planatscher Hannes, Supper Jochen, Poetz Oliver, Stoll Dieter, Joos Thomas, Templin Markus F, Zell Andreas
University of Tübingen, Center for Bioinformatics, Sand 1, D-72076 Tübingen, Germany.
Algorithms Mol Biol. 2010 Jun 25;5:28. doi: 10.1186/1748-7188-5-28.
Mass spectrometry (MS) based protein profiling has become one of the key technologies in biomedical research and biomarker discovery. One bottleneck in MS-based protein analysis is sample preparation and an efficient fractionation step to reduce the complexity of the biological samples, which are too complex to be analyzed directly with MS. Sample preparation strategies that reduce the complexity of tryptic digests by using immunoaffinity based methods have shown to lead to a substantial increase in throughput and sensitivity in the proteomic mass spectrometry approach. The limitation of using such immunoaffinity-based approaches is the availability of the appropriate peptide specific capture antibodies. Recent developments in these approaches, where subsets of peptides with short identical terminal sequences can be enriched using antibodies directed against short terminal epitopes, promise a significant gain in efficiency.
We show that the minimal set of terminal epitopes for the coverage of a target protein list can be found by the formulation as a set cover problem, preceded by a filtering pipeline for the exclusion of peptides and target epitopes with undesirable properties.
For small datasets (a few hundred proteins) it is possible to solve the problem to optimality with moderate computational effort using commercial or free solvers. Larger datasets, like full proteomes require the use of heuristics.
基于质谱(MS)的蛋白质谱分析已成为生物医学研究和生物标志物发现的关键技术之一。基于MS的蛋白质分析中的一个瓶颈是样品制备以及一个有效的分级分离步骤,以降低生物样品的复杂性,因为这些生物样品过于复杂,无法直接用MS进行分析。通过使用基于免疫亲和的方法来降低胰蛋白酶消化产物复杂性的样品制备策略已显示出能显著提高蛋白质组质谱分析方法的通量和灵敏度。使用这种基于免疫亲和的方法的局限性在于合适的肽特异性捕获抗体的可用性。这些方法的最新进展表明,利用针对短末端表位的抗体可以富集具有短相同末端序列的肽子集,有望显著提高效率。
我们表明,通过将其表述为集合覆盖问题,并在之前设置一个用于排除具有不良特性的肽和目标表位的过滤管道,可以找到覆盖目标蛋白质列表所需的最小末端表位集。
对于小数据集(几百种蛋白质),使用商业或免费求解器通过适度的计算量就有可能将问题求解到最优解。对于更大的数据集,如完整的蛋白质组,则需要使用启发式算法。