Awan Muaaz Gul, Awan Abdullah Gul, Saeed Fahad
Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Al-Khwarizmi Institute of Computer Science (KICS), University of Engineering & Technology (UET), Lahore, Pakistan.
Netw Model Anal Health Inform Bioinform. 2021;10. doi: 10.1007/s13721-021-00298-3. Epub 2021 Mar 26.
Protein sequencing algorithms process data from a variety of instruments that has been generated under diverse experimental conditions. Currently there is no way to predict the accuracy of an algorithm for a given data set. Most of the published algorithms and associated software has been evaluated on limited number of experimental data sets. However, these performance evaluations do not cover the complete search space the algorithmand the software might encounter in real-world. To this end, we present a database of simulated spectra that can be used to benchmark any spectra to peptide search engine. We demonstrate the usability of this database by bench marking two popular peptide sequencing engines. We show wide variation in the accuracy of peptide deductions and a complete quality profile of a given algorithm can be useful for practitioners and algorithm developers. All benchmarking data is available at https://users.cs.fiu.edu/~fsaeed/Benchmark.html.
蛋白质测序算法处理来自各种仪器的数据,这些数据是在不同实验条件下生成的。目前,对于给定的数据集,没有办法预测算法的准确性。大多数已发表的算法和相关软件仅在有限数量的实验数据集上进行了评估。然而,这些性能评估并未涵盖算法和软件在现实世界中可能遇到的完整搜索空间。为此,我们提出了一个模拟光谱数据库,可用于对任何光谱与肽搜索引擎进行基准测试。我们通过对两个流行的肽测序引擎进行基准测试来证明该数据库的可用性。我们展示了肽推导准确性的广泛差异,并且给定算法的完整质量概况对从业者和算法开发者可能有用。所有基准测试数据可在https://users.cs.fiu.edu/~fsaeed/Benchmark.html获取。