Lee Sangjeong, Park Heejin, Kim Hyunwoo
Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea.
Center for Supercomputing Applications, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea.
Proteome Sci. 2021 Sep 18;19(1):11. doi: 10.1186/s12953-021-00179-7.
The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ.
We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database.
The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used.
目标-诱饵策略通过创建一个与目标数据库大小相同的诱饵数据库来有效地估计错误发现率(FDR)。诱饵数据库通过多种方法创建,如反向、伪反向、洗牌、伪洗牌和德布鲁因方法。由于目标数据库中冗余肽的比例不同,即目标数据库和诱饵数据库中独特(非冗余)肽的数量不同,根据使用的诱饵数据库不同,FDR有时会被高估或低估。
我们使用了两个蛋白质数据库(UniProt酿酒酵母蛋白质数据库和UniProt人类蛋白质数据库)来比较各种诱饵数据库的FDR。当目标数据库中冗余肽的比例较低时,任何诱饵构建方法都不会高估FDR。然而,如果目标数据库中冗余肽的比例较高,使用(伪)洗牌诱饵数据库时FDR会被高估。此外,人类和酿酒酵母六框架翻译数据库(大型数据库)也显示出与UniProt人类蛋白质数据库类似的结果。
当使用(伪)洗牌诱饵数据库时,必须使用Elias和Gygi提出的校正因子或Kim等人提出的校正因子来估计FDR。