• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过提升学习多种类型的序列特征来准确预测 DNA N-甲基胞嘧啶位点。

Accurate prediction of DNA N-methylcytosine sites via boost-learning various types of sequence features.

机构信息

Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia.

Data Science Institute, University of Technology Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia.

出版信息

BMC Genomics. 2020 Sep 11;21(1):627. doi: 10.1186/s12864-020-07033-8.

DOI:10.1186/s12864-020-07033-8
PMID:32917152
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7488740/
Abstract

BACKGROUND

DNA N4-methylcytosine (4mC) is a critical epigenetic modification and has various roles in the restriction-modification system. Due to the high cost of experimental laboratory detection, computational methods using sequence characteristics and machine learning algorithms have been explored to identify 4mC sites from DNA sequences. However, state-of-the-art methods have limited performance because of the lack of effective sequence features and the ad hoc choice of learning algorithms to cope with this problem. This paper is aimed to propose new sequence feature space and a machine learning algorithm with feature selection scheme to address the problem.

RESULTS

The feature importance score distributions in datasets of six species are firstly reported and analyzed. Then the impact of the feature selection on model performance is evaluated by independent testing on benchmark datasets, where ACC and MCC measurements on the performance after feature selection increase by 2.3% to 9.7% and 0.05 to 0.19, respectively. The proposed method is compared with three state-of-the-art predictors using independent test and 10-fold cross-validations, and our method outperforms in all datasets, especially improving the ACC by 3.02% to 7.89% and MCC by 0.06 to 0.15 in the independent test. Two detailed case studies by the proposed method have confirmed the excellent overall performance and correctly identified 24 of 26 4mC sites from the C.elegans gene, and 126 out of 137 4mC sites from the D.melanogaster gene.

CONCLUSIONS

The results show that the proposed feature space and learning algorithm with feature selection can improve the performance of DNA 4mC prediction on the benchmark datasets. The two case studies prove the effectiveness of our method in practical situations.

摘要

背景

DNA N4-甲基胞嘧啶(4mC)是一种关键的表观遗传修饰,在限制修饰系统中具有多种功能。由于实验实验室检测成本高昂,因此已经探索了使用序列特征和机器学习算法的计算方法,以便从 DNA 序列中识别 4mC 位点。然而,由于缺乏有效的序列特征和专门选择的学习算法来应对这个问题,最先进的方法的性能受到限制。本文旨在提出新的序列特征空间和机器学习算法以及特征选择方案来解决这个问题。

结果

首先报告和分析了六个物种数据集的特征重要性得分分布。然后,通过在基准数据集上进行独立测试来评估特征选择对模型性能的影响,其中选择特征后的 ACC 和 MCC 测量值分别提高了 2.3%至 9.7%和 0.05 至 0.19。通过独立测试和 10 折交叉验证,将所提出的方法与三种最先进的预测器进行了比较,在所有数据集上都表现出色,特别是在独立测试中,ACC 提高了 3.02%至 7.89%,MCC 提高了 0.06 至 0.15。通过所提出的方法进行的两个详细案例研究证实了其出色的整体性能,并正确识别了来自 C.elegans 基因的 26 个 4mC 位点中的 24 个,以及来自 D.melanogaster 基因的 137 个 4mC 位点中的 126 个。

结论

结果表明,所提出的特征空间和具有特征选择的学习算法可以提高基准数据集上的 DNA 4mC 预测性能。两个案例研究证明了我们的方法在实际情况下的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/511c3fe0807a/12864_2020_7033_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/6fd035c6fdee/12864_2020_7033_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/a579ef4b7264/12864_2020_7033_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/6801dd4dc50f/12864_2020_7033_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/62a1126d495c/12864_2020_7033_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/511c3fe0807a/12864_2020_7033_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/6fd035c6fdee/12864_2020_7033_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/a579ef4b7264/12864_2020_7033_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/6801dd4dc50f/12864_2020_7033_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/62a1126d495c/12864_2020_7033_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a34f/7488740/511c3fe0807a/12864_2020_7033_Fig5_HTML.jpg

相似文献

1
Accurate prediction of DNA N-methylcytosine sites via boost-learning various types of sequence features.通过提升学习多种类型的序列特征来准确预测 DNA N-甲基胞嘧啶位点。
BMC Genomics. 2020 Sep 11;21(1):627. doi: 10.1186/s12864-020-07033-8.
2
4mCBERT: A computing tool for the identification of DNA N4-methylcytosine sites by sequence- and chemical-derived information based on ensemble learning strategies.4mCBERT:一种基于集成学习策略,通过序列和化学衍生信息识别DNA N4-甲基胞嘧啶位点的计算工具。
Int J Biol Macromol. 2023 Mar 15;231:123180. doi: 10.1016/j.ijbiomac.2023.123180. Epub 2023 Jan 13.
3
Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning.Deep4mC:通过深度学习对 DNA N4-甲基胞嘧啶位点进行系统评估和计算预测。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa099.
4
Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species.探索基于序列的特征,以提高在多个物种中预测 DNA N4-甲基胞嘧啶位点的能力。
Bioinformatics. 2019 Apr 15;35(8):1326-1333. doi: 10.1093/bioinformatics/bty824.
5
Iterative feature representations improve N4-methylcytosine site prediction.迭代特征表示可提高 N4-甲基胞嘧啶位点预测的准确性。
Bioinformatics. 2019 Dec 1;35(23):4930-4937. doi: 10.1093/bioinformatics/btz408.
6
A novel method for predicting DNA N-methylcytosine sites based on deep forest algorithm.一种基于深度森林算法预测DNA N-甲基胞嘧啶位点的新方法。
J Bioinform Comput Biol. 2023 Feb;21(1):2350003. doi: 10.1142/S0219720023500038. Epub 2023 Mar 9.
7
DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites.DeepTorrent:一种基于深度学习的方法,用于预测 DNA N4-甲基胞嘧啶位点。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa124.
8
Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method.利用机器学习方法对小鼠基因组中N4-甲基胞嘧啶位点进行计算识别。
Math Biosci Eng. 2021 Apr 15;18(4):3348-3363. doi: 10.3934/mbe.2021167.
9
4mCpred-EL: An Ensemble Learning Framework for Identification of DNA -methylcytosine Sites in the Mouse Genome.4mCpred-EL:用于鉴定小鼠基因组中 DNA-甲基胞嘧啶位点的集成学习框架。
Cells. 2019 Oct 28;8(11):1332. doi: 10.3390/cells8111332.
10
4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction.4mCPred:用于 DNA N4-甲基胞嘧啶位点预测的机器学习方法。
Bioinformatics. 2019 Feb 15;35(4):593-601. doi: 10.1093/bioinformatics/bty668.

引用本文的文献

1
Methyl-GP: accurate generic DNA methylation prediction based on a language model and representation learning.甲基化基因组图谱(Methyl-GP):基于语言模型和表征学习的准确通用DNA甲基化预测
Nucleic Acids Res. 2025 Mar 20;53(6). doi: 10.1093/nar/gkaf223.
2
Comparative evaluation and analysis of DNA N4-methylcytosine methylation sites using deep learning.利用深度学习对DNA N4-甲基胞嘧啶甲基化位点进行比较评估与分析
Front Genet. 2023 Aug 21;14:1254827. doi: 10.3389/fgene.2023.1254827. eCollection 2023.
3
DRSN4mCPred: accurately predicting sites of DNA N4-methylcytosine using deep residual shrinkage network for diagnosis and treatment of gastrointestinal cancer in the precision medicine era.

本文引用的文献

1
REBASE: a database for DNA restriction and modification: enzymes, genes and genomes.REBASE:一个用于 DNA 限制和修饰的数据库:酶、基因和基因组。
Nucleic Acids Res. 2023 Jan 6;51(D1):D629-D630. doi: 10.1093/nar/gkac975.
2
i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes.i4mC-Mouse:使用多种编码方案改进对小鼠基因组中DNA N4-甲基胞嘧啶位点的识别。
Comput Struct Biotechnol J. 2020 Apr 8;18:906-912. doi: 10.1016/j.csbj.2020.04.001. eCollection 2020.
3
i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome.
DRSN4mCPred:在精准医学时代,使用深度残差收缩网络准确预测DNA N4-甲基胞嘧啶位点以用于胃肠道癌的诊断和治疗。
Front Med (Lausanne). 2023 May 4;10:1187430. doi: 10.3389/fmed.2023.1187430. eCollection 2023.
4
m6Aminer: Predicting the m6Am Sites on mRNA by Fusing Multiple Sequence-Derived Features into a CatBoost-Based Classifier.m6Aminer:通过将多种序列衍生特征融合到基于 CatBoost 的分类器中,预测 mRNA 上的 m6A 位点。
Int J Mol Sci. 2023 Apr 26;24(9):7878. doi: 10.3390/ijms24097878.
5
A Grid Search-Based Multilayer Dynamic Ensemble System to Identify DNA N4-Methylcytosine Using Deep Learning Approach.基于网格搜索的多层动态集成系统,利用深度学习方法识别 DNA N4-甲基胞嘧啶。
Genes (Basel). 2023 Feb 25;14(3):582. doi: 10.3390/genes14030582.
6
iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylations.iDNA-ABF:用于可解释的 DNA 甲基化预测的多尺度深度生物语言学习模型。
Genome Biol. 2022 Oct 17;23(1):219. doi: 10.1186/s13059-022-02780-1.
7
Accurate Prediction of Anti-hypertensive Peptides Based on Convolutional Neural Network and Gated Recurrent unit.基于卷积神经网络和门控循环单元的抗高血压肽的准确预测
Interdiscip Sci. 2022 Dec;14(4):879-894. doi: 10.1007/s12539-022-00521-3. Epub 2022 Apr 27.
8
Systematic Analysis and Accurate Identification of DNA N4-Methylcytosine Sites by Deep Learning.基于深度学习的DNA N4-甲基胞嘧啶位点的系统分析与准确识别
Front Microbiol. 2022 Mar 15;13:843425. doi: 10.3389/fmicb.2022.843425. eCollection 2022.
i4mC-ROSE,一种用于鉴定蔷薇科基因组中 DNA N4-甲基胞嘧啶位点的生物信息学工具。
Int J Biol Macromol. 2020 Aug 15;157:752-758. doi: 10.1016/j.ijbiomac.2019.12.009. Epub 2019 Dec 2.
4
4mCpred-EL: An Ensemble Learning Framework for Identification of DNA -methylcytosine Sites in the Mouse Genome.4mCpred-EL:用于鉴定小鼠基因组中 DNA-甲基胞嘧啶位点的集成学习框架。
Cells. 2019 Oct 28;8(11):1332. doi: 10.3390/cells8111332.
5
Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation.Meta-4mCpred:一种基于序列的元预测器,用于通过有效特征表示准确预测DNA 4mC位点。
Mol Ther Nucleic Acids. 2019 Jun 7;16:733-744. doi: 10.1016/j.omtn.2019.04.019. Epub 2019 Apr 30.
6
Iterative feature representations improve N4-methylcytosine site prediction.迭代特征表示可提高 N4-甲基胞嘧啶位点预测的准确性。
Bioinformatics. 2019 Dec 1;35(23):4930-4937. doi: 10.1093/bioinformatics/btz408.
7
Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species.探索基于序列的特征,以提高在多个物种中预测 DNA N4-甲基胞嘧啶位点的能力。
Bioinformatics. 2019 Apr 15;35(8):1326-1333. doi: 10.1093/bioinformatics/bty824.
8
4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction.4mCPred:用于 DNA N4-甲基胞嘧啶位点预测的机器学习方法。
Bioinformatics. 2019 Feb 15;35(4):593-601. doi: 10.1093/bioinformatics/bty668.
9
Selective recognition of 4-methylcytosine in DNA by engineered transcription-activator-like effectors.工程化转录激活子样效应因子对 DNA 中 4-甲基胞嘧啶的选择性识别。
Philos Trans R Soc Lond B Biol Sci. 2018 Jun 5;373(1748). doi: 10.1098/rstb.2017.0078.
10
iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties.iDNA4mC:基于核苷酸化学性质鉴定 DNA N4-甲基胞嘧啶位点。
Bioinformatics. 2017 Nov 15;33(22):3518-3523. doi: 10.1093/bioinformatics/btx479.