Suppr超能文献

开发一种实验拆分方法,用于基准化 PTM 位点预测器的泛化能力:以赖氨酸甲基组为例。

Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example.

机构信息

School of Basic Medicine, Qingdao University, Qingdao, China.

College of Life Science, Qingdao University, Qingdao, China.

出版信息

PLoS Comput Biol. 2021 Dec 8;17(12):e1009682. doi: 10.1371/journal.pcbi.1009682. eCollection 2021 Dec.

Abstract

Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test sets. However, the self-reported performances of most classifiers based on this measure are generally higher than their performances in the application of new experimental data. It suggests that the cross-validation method overestimates the generalization ability of a classifier. Here, we proposed a generalization estimate method, dubbed experiment-split test, where the experimental sources for the training set are different from those for the test set that simulate the data derived from a new experiment. We took the prediction of lysine methylome (Kme) as an example and developed a deep learning-based Kme site predictor (called DeepKme) with outstanding performance. We assessed the experiment-split test by comparing it with the cross-validation method. We found that the performance measured using the experiment-split test is lower than that measured in terms of cross-validation. As the test data of the experiment-split method were derived from an independent experimental source, this method could reflect the generalization of the predictor. Therefore, we believe that the experiment-split method can be applied to benchmark the practical performance of a given PTM model. DeepKme is free accessible via https://github.com/guoyangzou/DeepKme.

摘要

许多计算分类器已经被开发出来,用于预测不同类型的翻译后修饰位点。它们的性能通过交叉验证或独立测试来衡量,其中来自不同来源的实验数据被混合并随机分为训练集和测试集。然而,基于这种度量的大多数分类器的自我报告性能通常高于它们在新实验数据应用中的性能。这表明交叉验证方法高估了分类器的泛化能力。在这里,我们提出了一种泛化估计方法,称为实验拆分测试,其中训练集的实验来源与测试集的实验来源不同,模拟来自新实验的数据。我们以赖氨酸甲基组(Kme)的预测为例,开发了一种基于深度学习的 Kme 位点预测器(称为 DeepKme),该预测器具有出色的性能。我们通过将其与交叉验证方法进行比较来评估实验拆分测试。我们发现,使用实验拆分测试测量的性能低于交叉验证的性能。由于实验拆分方法的测试数据来自独立的实验来源,因此该方法可以反映预测器的泛化能力。因此,我们认为实验拆分方法可用于基准测试给定 PTM 模型的实际性能。DeepKme 可通过 https://github.com/guoyangzou/DeepKme 免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9a9/8687584/fc6dbff570b0/pcbi.1009682.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验