Wang Shengze, Feng Shichao, Pan Chongle, Guo Xuan
Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States.
School of Computer Science Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK 73019, United States.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2022 Dec;2022:287-292. doi: 10.1109/bibm55620.2022.9995401. Epub 2023 Jan 2.
Microbial community proteomics, also termed metaproteomics, investigates all proteins expressed by a microbiota. Tandem mass spectrometry (MS/MS) is the typical method for identifying proteins in metaproteomics, which involves searching the mass spectra against a protein sequence database. A major post-analysis step is controlling the false discovery rate (FDR), i.e., the ratio of false positives to the total number of annotations. The current popular target-decoy FDR estimation method treats all the peptides and proteins equally and overlooks that they could have varied probabilities of being identified. In this study, we report FineFDR, a framework for FDR assessment at fine-grained levels with taxonomy information considered. FineFDR groups the identified peptide-spectrum matches, peptides, and proteins from different taxonomic units and estimates the FDR in each group separately. Empirical experiments on the simulated and real-world data sets demonstrate that our FineFDR achieved higher precision and more peptide and protein identifications when compared to the state-of-the-art methods, such as Comet, Percolator, TIDD, and Tailor. FineFDR is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/FDR.
微生物群落蛋白质组学,也称为宏蛋白质组学,研究微生物群表达的所有蛋白质。串联质谱(MS/MS)是宏蛋白质组学中鉴定蛋白质的典型方法,该方法涉及针对蛋白质序列数据库搜索质谱图。一个主要的分析后步骤是控制错误发现率(FDR),即假阳性与注释总数的比率。当前流行的目标诱饵FDR估计方法平等对待所有肽段和蛋白质,而忽略了它们被鉴定的概率可能不同。在本研究中,我们报告了FineFDR,这是一个在考虑分类信息的情况下进行细粒度水平FDR评估的框架。FineFDR对来自不同分类单元的已鉴定肽段-谱匹配、肽段和蛋白质进行分组,并分别估计每组中的FDR。在模拟和真实数据集上进行的实证实验表明,与Comet、Percolator、TIDD和Tailor等现有方法相比,我们的FineFDR具有更高的精度,并且鉴定出了更多的肽段和蛋白质。FineFDR可在GNU GPL许可下免费获取,网址为https://github.com/Biocomputing-Research-Group/FDR。