开发一种实验拆分方法，用于基准化 PTM 位点预测器的泛化能力：以赖氨酸甲基组为例。

Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example.

机构信息

School of Basic Medicine, Qingdao University, Qingdao, China.

College of Life Science, Qingdao University, Qingdao, China.

出版信息

PLoS Comput Biol. 2021 Dec 8;17(12):e1009682. doi: 10.1371/journal.pcbi.1009682. eCollection 2021 Dec.

DOI:10.1371/journal.pcbi.1009682

PMID:34879076

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8687584/

Abstract

Many computational classifiers have been developed to predict different types of post-translational modification sites. Their performances are measured using cross-validation or independent test, in which experimental data from different sources are mixed and randomly split into training and test sets. However, the self-reported performances of most classifiers based on this measure are generally higher than their performances in the application of new experimental data. It suggests that the cross-validation method overestimates the generalization ability of a classifier. Here, we proposed a generalization estimate method, dubbed experiment-split test, where the experimental sources for the training set are different from those for the test set that simulate the data derived from a new experiment. We took the prediction of lysine methylome (Kme) as an example and developed a deep learning-based Kme site predictor (called DeepKme) with outstanding performance. We assessed the experiment-split test by comparing it with the cross-validation method. We found that the performance measured using the experiment-split test is lower than that measured in terms of cross-validation. As the test data of the experiment-split method were derived from an independent experimental source, this method could reflect the generalization of the predictor. Therefore, we believe that the experiment-split method can be applied to benchmark the practical performance of a given PTM model. DeepKme is free accessible via https://github.com/guoyangzou/DeepKme.

摘要

许多计算分类器已经被开发出来，用于预测不同类型的翻译后修饰位点。它们的性能通过交叉验证或独立测试来衡量，其中来自不同来源的实验数据被混合并随机分为训练集和测试集。然而，基于这种度量的大多数分类器的自我报告性能通常高于它们在新实验数据应用中的性能。这表明交叉验证方法高估了分类器的泛化能力。在这里，我们提出了一种泛化估计方法，称为实验拆分测试，其中训练集的实验来源与测试集的实验来源不同，模拟来自新实验的数据。我们以赖氨酸甲基组（Kme）的预测为例，开发了一种基于深度学习的 Kme 位点预测器（称为 DeepKme），该预测器具有出色的性能。我们通过将其与交叉验证方法进行比较来评估实验拆分测试。我们发现，使用实验拆分测试测量的性能低于交叉验证的性能。由于实验拆分方法的测试数据来自独立的实验来源，因此该方法可以反映预测器的泛化能力。因此，我们认为实验拆分方法可用于基准测试给定 PTM 模型的实际性能。DeepKme 可通过 https://github.com/guoyangzou/DeepKme 免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9a9/8687584/fc6dbff570b0/pcbi.1009682.g001.jpg

相似文献

Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example.开发一种实验拆分方法，用于基准化 PTM 位点预测器的泛化能力：以赖氨酸甲基组为例。

PLoS Comput Biol. 2021 Dec 8;17(12):e1009682. doi: 10.1371/journal.pcbi.1009682. eCollection 2021 Dec.

Large-scale comparative assessment of computational predictors for lysine post-translational modification sites.大规模比较评估赖氨酸翻译后修饰位点的计算预测因子。

Brief Bioinform. 2019 Nov 27;20(6):2267-2290. doi: 10.1093/bib/bby089.

PhoglyStruct: Prediction of phosphoglycerylated lysine residues using structural properties of amino acids.PhoglyStruct：基于氨基酸结构性质预测磷酸甘油化赖氨酸残基。

Sci Rep. 2018 Dec 18;8(1):17923. doi: 10.1038/s41598-018-36203-8.

A deep learning method to more accurately recall known lysine acetylation sites.一种更准确地召回已知赖氨酸乙酰化位点的深度学习方法。

BMC Bioinformatics. 2019 Jan 23;20(1):49. doi: 10.1186/s12859-019-2632-9.

DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet.DeepDN_iGlu：基于注意力残差学习方法和 DenseNet 的赖氨酸瓜氨酸化位点预测。

Math Biosci Eng. 2023 Jan;20(2):2815-2830. doi: 10.3934/mbe.2023132. Epub 2022 Dec 1.

GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier.GBDT_KgluSite：一种基于特征融合和 GBDT 分类器的赖氨酸谷氨酰化位点改进计算预测模型。

BMC Genomics. 2023 Dec 11;24(1):765. doi: 10.1186/s12864-023-09834-z.

Bigram-PGK: phosphoglycerylation prediction using the technique of bigram probabilities of position specific scoring matrix.双元模型-PGK：基于位置特异得分矩阵双元概率技术的磷酸甘油酰化预测。

BMC Mol Cell Biol. 2019 Dec 20;20(Suppl 2):57. doi: 10.1186/s12860-019-0240-1.

RMTLysPTM: recognizing multiple types of lysine PTM sites by deep analysis on sequences.RMTLysPTM：通过对序列进行深度分析来识别多种类型的赖氨酸翻译后修饰位点

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad450.

DeepSSPred: A Deep Learning Based Sulfenylation Site Predictor Via a Novel nSegmented Optimize Federated Feature Encoder.DeepSSPred：一种基于深度学习的新型 nSegmented Optimize 联邦特征编码器的硫化位点预测器。

Protein Pept Lett. 2021;28(6):708-721. doi: 10.2174/0929866527666201202103411.

DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information.DeepPPSite：一种基于深度学习的模型，用于利用有效的序列信息分析和预测磷酸化位点。

Anal Biochem. 2021 Jan 1;612:113955. doi: 10.1016/j.ab.2020.113955. Epub 2020 Sep 16.

引用本文的文献

Determining structures of RNA conformers using AFM and deep neural networks.利用原子力显微镜和深度神经网络确定RNA构象异构体的结构。

Nature. 2025 Jan;637(8048):1234-1243. doi: 10.1038/s41586-024-07559-x. Epub 2024 Dec 18.

Discriminant analysis using MRI asymmetry indices and cognitive scores of women with temporal lobe epilepsy or schizophrenia.使用MRI不对称指数和颞叶癫痫或精神分裂症女性认知评分的判别分析。

Neuroradiology. 2024 Jul;66(7):1083-1092. doi: 10.1007/s00234-024-03317-y. Epub 2024 Feb 28.

Determining structures of individual RNA conformers using atomic force microscopy images and deep neural networks.利用原子力显微镜图像和深度神经网络确定单个RNA构象异构体的结构。

Res Sq. 2023 Jun 7:rs.3.rs-2798658. doi: 10.21203/rs.3.rs-2798658/v1.

本文引用的文献

DeepCSO: A Deep-Learning Network Approach to Predicting Cysteine S-Sulphenylation Sites.DeepCSO：一种用于预测半胱氨酸S-亚磺酰化位点的深度学习网络方法。

Front Cell Dev Biol. 2020 Dec 1;8:594587. doi: 10.3389/fcell.2020.594587. eCollection 2020.

DeepKhib: A Deep-Learning Framework for Lysine 2-Hydroxyisobutyrylation Sites Prediction.DeepKhib：一种用于赖氨酸2-羟基异丁酰化位点预测的深度学习框架。

Front Cell Dev Biol. 2020 Sep 9;8:580217. doi: 10.3389/fcell.2020.580217. eCollection 2020.

Assessing predictors for new post translational modification sites: A case study on hydroxylation.评估新的翻译后修饰位点的预测因子：以羟基化为例的案例研究。

PLoS Comput Biol. 2020 Jun 22;16(6):e1007967. doi: 10.1371/journal.pcbi.1007967. eCollection 2020 Jun.

MusiteDeep: a deep-learning based webserver for protein post-translational modification site prediction and visualization.MusiteDeep：一个基于深度学习的蛋白质翻译后修饰位点预测和可视化的网络服务器。

Nucleic Acids Res. 2020 Jul 2;48(W1):W140-W146. doi: 10.1093/nar/gkaa275.

Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method.基于深度学习方法的赖氨酸琥珀酰化修饰位点的鉴定与特征分析。

Sci Rep. 2019 Nov 7;9(1):16175. doi: 10.1038/s41598-019-52552-4.

Integration of A Deep Learning Classifier with A Random Forest Approach for Predicting Malonylation Sites.深度学习分类器与随机森林方法相结合，用于预测丙二酰化位点。

Genomics Proteomics Bioinformatics. 2018 Dec;16(6):451-459. doi: 10.1016/j.gpb.2018.08.004. Epub 2019 Jan 11.

dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications.dbPTM 于 2019 年：探索翻译后修饰的疾病关联和串扰。

Nucleic Acids Res. 2019 Jan 8;47(D1):D298-D308. doi: 10.1093/nar/gky1074.

BERMP: a cross-species classifier for predicting mA sites by integrating a deep learning algorithm and a random forest approach.BERMP：一种跨物种的 mA 位点预测分类器，它集成了深度学习算法和随机森林方法。

Int J Biol Sci. 2018 Sep 7;14(12):1669-1677. doi: 10.7150/ijbs.27819. eCollection 2018.

Putting benchmarks in their rightful place: The heart of computational biology.将基准置于适当位置：计算生物学的核心。

PLoS Comput Biol. 2018 Nov 8;14(11):e1006494. doi: 10.1371/journal.pcbi.1006494. eCollection 2018 Nov.

Affinity Purification of Methyllysine Proteome by Site-Specific Covalent Conjugation.通过定点共价偶联对甲基赖氨酸蛋白质组进行亲和纯化。

Anal Chem. 2018 Dec 4;90(23):13876-13881. doi: 10.1021/acs.analchem.8b02796. Epub 2018 Nov 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

开发一种实验拆分方法，用于基准化 PTM 位点预测器的泛化能力：以赖氨酸甲基组为例。

Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献