使用人类蛋白质组通过DIA-NN在无文库搜索中评估错误发现率

Evaluation of the False Discovery Rate in Library-Free Search by DIA-NN Using Human Proteome.

作者信息

Gu Kongxin, Kenko Masanaga, Ogawa Koji, Goshima Naoki, Masuda Takeshi, Ito Shingo, Ohtsuki Sumio

机构信息

Department of Pharmaceutical Microbiology, Graduate School of Pharmaceutical Sciences, Kumamoto University, Kumamoto 862-0973, Japan.

Department of Pharmaceutical Microbiology, School of Pharmacy, Kumamoto University, Kumamoto 862-0973, Japan.

出版信息

J Proteome Res. 2025 Aug 1;24(8):3874-3883. doi: 10.1021/acs.jproteome.5c00036. Epub 2025 Jul 18.

DOI:10.1021/acs.jproteome.5c00036

PMID:40679152

Abstract

Recently, deep-learning-based spectral libraries have gained increasing attention. Several data-independent acquisition (DIA) software tools have integrated this feature, known as a library-free search, thereby making DIA analysis more accessible. However, controlling the false discovery rate (FDR) is challenging owing to the vast amount of peptide information in libraries. In this study, we introduced a stringent method to evaluate FDR control using DIA software. Recombinant proteins were synthesized from full-length human cDNA libraries and analyzed by using liquid chromatography-mass spectrometry and DIA software. The results were compared with known protein sequences to calculate the FDR. Notably, we compared the identification performance of DIA-NN versions 1.8.1, 1.9.2, and 2.1.0. Versions 1.9.2 and 2.10 identified more peptides than version 1.8.1, and versions 1.9.2 and 2.1.0 used a more conservative identification approach, thus significantly improving the FDR control. Across the synthesized recombinant protein mixtures, the average FDR at the precursor level was 0.538% for version 1.8.1, 0.389% for version 1.9.2, and 0.385% for version 2.1.0; at the protein level, the FDRs were 2.85%, 1.81%, and 1.81%, respectively. Collectively, our data set provides valuable insights for comparing FDR controls across DIA software and aiding bioinformaticians in enhancing their tools.

摘要

近年来，基于深度学习的光谱库越来越受到关注。一些数据非依赖型采集（DIA）软件工具已集成了这一功能，即无库搜索，从而使DIA分析更易于使用。然而，由于库中肽信息量大，控制错误发现率（FDR）具有挑战性。在本研究中，我们引入了一种严格的方法来评估使用DIA软件时的FDR控制。从全长人类cDNA文库合成重组蛋白，并使用液相色谱-质谱联用仪和DIA软件进行分析。将结果与已知蛋白质序列进行比较以计算FDR。值得注意的是，我们比较了DIA-NN 1.8.1、1.9.2和2.1.0版本的鉴定性能。1.9.2和2.1.0版本比1.8.1版本鉴定出更多的肽，并且1.9.2和2.1.0版本使用了更保守的鉴定方法，从而显著改善了FDR控制。在合成的重组蛋白混合物中，1.8.1版本在前体水平的平均FDR为0.538%，1.9.2版本为0.389%，2.1.0版本为0.385%；在蛋白质水平，FDR分别为2.85%、1.81%和1.81%。总体而言，我们的数据集为比较不同DIA软件的FDR控制以及帮助生物信息学家改进他们的工具提供了有价值的见解。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用人类蛋白质组通过DIA-NN在无文库搜索中评估错误发现率

Evaluation of the False Discovery Rate in Library-Free Search by DIA-NN Using Human Proteome.

作者信息

机构信息

出版信息

相似文献

使用人类蛋白质组通过DIA-NN在无文库搜索中评估错误发现率

Evaluation of the False Discovery Rate in Library-Free Search by DIA-NN Using Human Proteome.

作者信息

机构信息

出版信息

相似文献