Meta-i6mA：利用集成机器学习框架中的信息特征，用于识别植物基因组中 DNA N6-甲基腺嘌呤位点的种间预测因子。

Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework.

机构信息

China Agricultural University, Beijing.

Department of Physiology, Ajou University School of Medicine, Republic of Korea.

出版信息

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa202.

DOI:10.1093/bib/bbaa202

PMID:32910169

Abstract

DNA N6-methyladenine (6mA) represents important epigenetic modifications, which are responsible for various cellular processes. The accurate identification of 6mA sites is one of the challenging tasks in genome analysis, which leads to an understanding of their biological functions. To date, several species-specific machine learning (ML)-based models have been proposed, but majority of them did not test their model to other species. Hence, their practical application to other plant species is quite limited. In this study, we explored 10 different feature encoding schemes, with the goal of capturing key characteristics around 6mA sites. We selected five feature encoding schemes based on physicochemical and position-specific information that possesses high discriminative capability. The resultant feature sets were inputted to six commonly used ML methods (random forest, support vector machine, extremely randomized tree, logistic regression, naïve Bayes and AdaBoost). The Rosaceae genome was employed to train the above classifiers, which generated 30 baseline models. To integrate their individual strength, Meta-i6mA was proposed that combined the baseline models using the meta-predictor approach. In extensive independent test, Meta-i6mA showed high Matthews correlation coefficient values of 0.918, 0.827 and 0.635 on Rosaceae, rice and Arabidopsis thaliana, respectively and outperformed the existing predictors. We anticipate that the Meta-i6mA can be applied across different plant species. Furthermore, we developed an online user-friendly web server, which is available at http://kurata14.bio.kyutech.ac.jp/Meta-i6mA/.

摘要

DNA N6-甲基腺嘌呤（6mA）代表重要的表观遗传修饰，负责各种细胞过程。6mA 位点的准确识别是基因组分析中的一项具有挑战性的任务，这有助于理解其生物学功能。迄今为止，已经提出了几种基于机器学习（ML）的物种特异性模型，但大多数模型都没有将其模型应用于其他物种。因此，它们在其他植物物种中的实际应用相当有限。在这项研究中，我们探索了 10 种不同的特征编码方案，旨在捕获 6mA 位点周围的关键特征。我们选择了基于理化性质和位置特异性信息的 5 种特征编码方案，这些方案具有较高的区分能力。将所得特征集输入到 6 种常用的 ML 方法（随机森林、支持向量机、极端随机树、逻辑回归、朴素贝叶斯和 AdaBoost）中。使用蔷薇科基因组来训练上述分类器，生成 30 个基线模型。为了整合它们的个体优势，我们提出了 Meta-i6mA，该方法使用元预测器方法组合基线模型。在广泛的独立测试中，Meta-i6mA 在蔷薇科、水稻和拟南芥上的 Matthews 相关系数值分别高达 0.918、0.827 和 0.635，优于现有的预测器。我们预计 Meta-i6mA 可以应用于不同的植物物种。此外，我们开发了一个在线用户友好的网络服务器，可在 http://kurata14.bio.kyutech.ac.jp/Meta-i6mA/ 上获得。

相似文献

Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework.Meta-i6mA：利用集成机器学习框架中的信息特征，用于识别植物基因组中 DNA N6-甲基腺嘌呤位点的种间预测因子。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa202.

i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation.i6mA-Fuse：通过融合多种特征表示来改进和增强蔷薇科基因组中 DNA 6mA 位点的预测

Plant Mol Biol. 2020 May;103(1-2):225-234. doi: 10.1007/s11103-020-00988-y. Epub 2020 Mar 5.

i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting.i6mA-Vote：基于投票集成学习的植物基因组中DNA N6-甲基腺嘌呤位点的跨物种鉴定

Front Plant Sci. 2022 Feb 14;13:845835. doi: 10.3389/fpls.2022.845835. eCollection 2022.

i6mA-Caps: a CapsuleNet-based framework for identifying DNA N6-methyladenine sites.i6mA-Caps：一种基于胶囊网络的 DNA N6-甲基腺嘌呤位点识别框架。

Bioinformatics. 2022 Aug 10;38(16):3885-3891. doi: 10.1093/bioinformatics/btac434.

i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome.i6mA-Pred：鉴定水稻基因组中的 DNA N6-甲基腺嘌呤位点。

Bioinformatics. 2019 Aug 15;35(16):2796-2800. doi: 10.1093/bioinformatics/btz015.

i6mA-stack: A stacking ensemble-based computational prediction of DNA N6-methyladenine (6mA) sites in the Rosaceae genome.i6mA-stack：基于堆叠集成法对蔷薇科基因组中DNA N6-甲基腺嘌呤（6mA）位点的计算预测。

Genomics. 2021 Jan;113(1 Pt 2):582-592. doi: 10.1016/j.ygeno.2020.09.054. Epub 2020 Oct 1.

i6mA-VC: A Multi-Classifier Voting Method for the Computational Identification of DNA N6-methyladenine Sites.i6mA-VC：一种用于计算鉴定 DNA N6-甲基腺嘌呤位点的多分类器投票方法。

Interdiscip Sci. 2021 Sep;13(3):413-425. doi: 10.1007/s12539-021-00429-4. Epub 2021 Apr 8.

i6mA-DNCP: Computational Identification of DNA -Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features.i6mA-DNCP：利用优化的二核苷酸特征计算鉴定水稻基因组中的 DNA-甲基腺嘌呤位点。

Genes (Basel). 2019 Oct 20;10(10):828. doi: 10.3390/genes10100828.

i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome.i4mC-ROSE，一种用于鉴定蔷薇科基因组中 DNA N4-甲基胞嘧啶位点的生物信息学工具。

Int J Biol Macromol. 2020 Aug 15;157:752-758. doi: 10.1016/j.ijbiomac.2019.12.009. Epub 2019 Dec 2.

Ense-i6mA: Identification of DNA N-Methyladenine Sites Using XGB-RFE Feature Selection and Ensemble Machine Learning.Ense-i6mA：使用XGB-RFE特征选择和集成机器学习识别DNA N-甲基腺嘌呤位点

IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1842-1854. doi: 10.1109/TCBB.2024.3421228. Epub 2024 Dec 10.

引用本文的文献

N7-methylguanosine (m7G) modification in breast cancer: clinical significances and molecular mechanisms.乳腺癌中的N7-甲基鸟苷（m7G）修饰：临床意义与分子机制

Cancer Cell Int. 2025 Aug 12;25(1):303. doi: 10.1186/s12935-025-03859-y.

DNA Methylation Recognition Using Hybrid Deep Learning with Dual Nucleotide Visualization Fusion Feature Encoding.基于双核苷酸可视化融合特征编码的混合深度学习DNA甲基化识别

Interdiscip Sci. 2025 Jul 16. doi: 10.1007/s12539-025-00737-z.

Identification of DNA N6-methyladenine modifications in the rice genome with a fine-tuned large language model.利用微调的大语言模型鉴定水稻基因组中的DNA N6-甲基腺嘌呤修饰

Front Plant Sci. 2025 Jun 25;16:1626539. doi: 10.3389/fpls.2025.1626539. eCollection 2025.

A genetic algorithm-based ensemble model for efficiently identifying interleukin 6 inducing peptides.一种基于遗传算法的集成模型，用于高效识别白细胞介素6诱导肽。

Sci Rep. 2025 Jul 1;15(1):21213. doi: 10.1038/s41598-025-05491-2.

HD-6mAPred: a hybrid deep learning approach for accurate prediction of N6-methyladenine sites in plant species.HD-6mAPred：一种用于准确预测植物物种中N6-甲基腺嘌呤位点的混合深度学习方法。

PeerJ. 2025 May 15;13:e19463. doi: 10.7717/peerj.19463. eCollection 2025.

Ensemble learning-based predictor for driver synonymous mutation with sequence representation.基于集成学习的具有序列表征的驱动同义突变预测器

PLoS Comput Biol. 2025 Jan 6;21(1):e1012744. doi: 10.1371/journal.pcbi.1012744. eCollection 2025 Jan.

RiceSNP-ABST: a deep learning approach to identify abiotic stress-associated single nucleotide polymorphisms in rice.水稻SNP-ABST：一种用于识别水稻中非生物胁迫相关单核苷酸多态性的深度学习方法。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae702.

AlzGenPred - CatBoost-based gene classifier for predicting Alzheimer's disease using high-throughput sequencing data.AlzGenPred - 基于CatBoost的基因分类器，用于利用高通量测序数据预测阿尔茨海默病。

Sci Rep. 2024 Dec 5;14(1):30294. doi: 10.1038/s41598-024-82208-x.

Human essential gene identification based on feature fusion and feature screening.基于特征融合与特征筛选的人类必需基因识别

IET Syst Biol. 2024 Dec;18(6):227-237. doi: 10.1049/syb2.12105. Epub 2024 Nov 22.

iDNA-ITLM: An interpretable and transferable learning model for identifying DNA methylation.iDNA-ITLM：一种用于识别 DNA 甲基化的可解释和可迁移学习模型。

PLoS One. 2024 Oct 31;19(10):e0301791. doi: 10.1371/journal.pone.0301791. eCollection 2024.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Meta-i6mA：利用集成机器学习框架中的信息特征，用于识别植物基因组中 DNA N6-甲基腺嘌呤位点的种间预测因子。

Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献