Kremer Lukas P M, Leufken Johannes, Oyunchimeg Purevdulam, Schulze Stefan, Fufezan Christian
Institute of Plant Biology and Biotechnology, University of Muenster , Schlossplatz 8, 48143 Münster, Germany.
J Proteome Res. 2016 Mar 4;15(3):788-94. doi: 10.1021/acs.jproteome.5b00860. Epub 2016 Jan 13.
Proteomics data integration has become a broad field with a variety of programs offering innovative algorithms to analyze increasing amounts of data. Unfortunately, this software diversity leads to many problems as soon as the data is analyzed using more than one algorithm for the same task. Although it was shown that the combination of multiple peptide identification algorithms yields more robust results, it is only recently that unified approaches are emerging; however, workflows that, for example, aim to optimize search parameters or that employ cascaded style searches can only be made accessible if data analysis becomes not only unified but also and most importantly scriptable. Here we introduce Ursgal, a Python interface to many commonly used bottom-up proteomics tools and to additional auxiliary programs. Complex workflows can thus be composed using the Python scripting language using a few lines of code. Ursgal is easily extensible, and we have made several database search engines (X!Tandem, OMSSA, MS-GF+, Myrimatch, MS Amanda), statistical postprocessing algorithms (qvality, Percolator), and one algorithm that combines statistically postprocessed outputs from multiple search engines ("combined FDR") accessible as an interface in Python. Furthermore, we have implemented a new algorithm ("combined PEP") that combines multiple search engines employing elements of "combined FDR", PeptideShaker, and Bayes' theorem.
蛋白质组学数据整合已成为一个广泛的领域,有各种各样的程序提供创新算法来分析越来越多的数据。不幸的是,一旦针对同一任务使用多种算法分析数据,这种软件的多样性就会导致许多问题。尽管已经表明多种肽段鉴定算法的组合能产生更可靠的结果,但直到最近才出现统一的方法;然而,例如旨在优化搜索参数或采用级联式搜索的工作流程,只有在数据分析不仅实现统一而且最重要的是可编写脚本的情况下才能使用。在这里,我们介绍Ursgal,它是一个Python接口,可连接许多常用的自下而上蛋白质组学工具以及其他辅助程序。因此,可以使用Python脚本语言通过几行代码来构建复杂的工作流程。Ursgal易于扩展,我们已经使几个数据库搜索引擎(X!Tandem、OMSSA、MS-GF+、Myrimatch、MS Amanda)、统计后处理算法(qvality、Percolator)以及一种结合多个搜索引擎统计后处理输出的算法(“组合错误发现率”)在Python中作为接口可用。此外,我们还实现了一种新算法(“组合肽段假阳性率”),该算法结合了多个搜索引擎,采用了“组合错误发现率”、PeptideShaker和贝叶斯定理的元素。