一种基于特征融合与优化协议的DNA N6-甲基腺嘌呤修饰预测的生物信息学工具。

A Bioinformatics Tool for the Prediction of DNA N6-Methyladenine Modifications Based on Feature Fusion and Optimization Protocol.

作者信息

Cai Jianhua, Wang Donghua, Chen Riqing, Niu Yuzhen, Ye Xiucai, Su Ran, Xiao Guobao, Wei Leyi

机构信息

Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering, Minjiang University, Fuzhou, China.

College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China.

出版信息

Front Bioeng Biotechnol. 2020 Jun 4;8:502. doi: 10.3389/fbioe.2020.00502. eCollection 2020.

DOI:10.3389/fbioe.2020.00502

PMID:32582654

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7287168/

Abstract

DNA N-methyladenine (6mA) is closely involved with various biological processes. Identifying the distributions of 6mA modifications in genome-scale is of great significance to in-depth understand the functions. In recent years, various experimental and computational methods have been proposed for this purpose. Unfortunately, existing methods cannot provide accurate and fast 6mA prediction. In this study, we present 6mAPred-FO, a bioinformatics tool that enables researchers to make predictions based on sequences only. To sufficiently capture the characteristics of 6mA sites, we integrate the sequence-order information with nucleotide positional specificity information for feature encoding, and further improve the feature representation capacity by analysis of variance-based feature optimization protocol. The experimental results show that using this feature protocol, we can significantly improve the predictive performance. Via further feature analysis, we found that the sequence-order information and positional specificity information are complementary to each other, contributing to the performance improvement. On the other hand, the improvement is also due to the use of the feature optimization protocol, which is capable of effectively capturing the most informative features from the original feature space. Moreover, benchmarking comparison results demonstrate that our 6mAPred-FO outperforms several existing predictors. Finally, we establish a web-server that implements the proposed method for convenience of researchers' use, which is currently available at http://server.malab.cn/6mAPred-FO.

摘要

DNA N-甲基腺嘌呤（6mA）与多种生物学过程密切相关。在全基因组范围内识别6mA修饰的分布对于深入理解其功能具有重要意义。近年来，为此目的提出了各种实验和计算方法。不幸的是，现有方法无法提供准确快速的6mA预测。在本研究中，我们提出了6mAPred-FO，这是一种生物信息学工具，使研究人员能够仅基于序列进行预测。为了充分捕捉6mA位点的特征，我们将序列顺序信息与核苷酸位置特异性信息整合用于特征编码，并通过基于方差分析的特征优化协议进一步提高特征表示能力。实验结果表明，使用此特征协议，我们可以显著提高预测性能。通过进一步的特征分析，我们发现序列顺序信息和位置特异性信息相互补充，有助于性能提升。另一方面，性能的提升还归因于特征优化协议的使用，该协议能够从原始特征空间中有效捕捉最具信息性的特征。此外，基准比较结果表明我们的6mAPred-FO优于几种现有的预测器。最后，我们建立了一个网络服务器来实现所提出的方法，方便研究人员使用，目前可在http://server.malab.cn/6mAPred-FO获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7470/7287168/4ad8fc8bd5ea/fbioe-08-00502-g0001.jpg

相似文献

A Bioinformatics Tool for the Prediction of DNA N6-Methyladenine Modifications Based on Feature Fusion and Optimization Protocol.

Front Bioeng Biotechnol. 2020 Jun 4;8:502. doi: 10.3389/fbioe.2020.00502. eCollection 2020.

MM-6mAPred: identifying DNA N6-methyladenine sites based on Markov model.

Bioinformatics. 2020 Jan 15;36(2):388-392. doi: 10.1093/bioinformatics/btz556.

Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa202.

iDNA6mA-Rice: A Computational Tool for Detecting N6-Methyladenine Sites in Rice.

Front Genet. 2019 Sep 10;10:793. doi: 10.3389/fgene.2019.00793. eCollection 2019.

Critical evaluation of web-based DNA N6-methyladenine site prediction tools.

Brief Funct Genomics. 2021 Jul 17;20(4):258-272. doi: 10.1093/bfgp/elaa028.

iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning.

Front Bioeng Biotechnol. 2020 Mar 31;8:227. doi: 10.3389/fbioe.2020.00227. eCollection 2020.

i6mA-stack: A stacking ensemble-based computational prediction of DNA N6-methyladenine (6mA) sites in the Rosaceae genome.

Genomics. 2021 Jan;113(1 Pt 2):582-592. doi: 10.1016/j.ygeno.2020.09.054. Epub 2020 Oct 1.

SNNRice6mA: A Deep Learning Method for Predicting DNA N6-Methyladenine Sites in Rice Genome.

Front Genet. 2019 Oct 11;10:1071. doi: 10.3389/fgene.2019.01071. eCollection 2019.

Iterative feature representations improve N4-methylcytosine site prediction.

Bioinformatics. 2019 Dec 1;35(23):4930-4937. doi: 10.1093/bioinformatics/btz408.

i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome.

Bioinformatics. 2019 Aug 15;35(16):2796-2800. doi: 10.1093/bioinformatics/btz015.

引用本文的文献

Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns.

Sci Rep. 2024 Apr 24;14(1):9466. doi: 10.1038/s41598-024-57457-5.

Identifying target ion channel-related genes to construct a diagnosis model for insulinoma.

Front Genet. 2023 Sep 12;14:1181307. doi: 10.3389/fgene.2023.1181307. eCollection 2023.

Computational prediction of species-specific yeast DNA replication origin via iterative feature representation.

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa304.

本文引用的文献

DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks.

Brief Bioinform. 2020 Sep 25;21(5):1733-1741. doi: 10.1093/bib/bbz098.

iRO-PsekGCC: Identify DNA Replication Origins Based on Pseudo k-Tuple GC Composition.

Front Genet. 2019 Sep 18;10:842. doi: 10.3389/fgene.2019.00842. eCollection 2019.

A Random Forest Sub-Golgi Protein Classifier Optimized via Dipeptide and Amino Acid Composition Features.

Front Bioeng Biotechnol. 2019 Sep 4;7:215. doi: 10.3389/fbioe.2019.00215. eCollection 2019.

iPromoter-2L2.0: Identifying Promoters and Their Types by Combining Smoothing Cutting Window Algorithm and Sequence-Based Features.

Mol Ther Nucleic Acids. 2019 Dec 6;18:80-87. doi: 10.1016/j.omtn.2019.08.008. Epub 2019 Aug 14.

A network embedding-based multiple information integration method for the MiRNA-disease association prediction.

BMC Bioinformatics. 2019 Sep 12;20(1):468. doi: 10.1186/s12859-019-3063-3.

BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches.

Nucleic Acids Res. 2019 Nov 18;47(20):e127. doi: 10.1093/nar/gkz740.

PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting lncRNAs from Transcripts.

Genes (Basel). 2019 Sep 3;10(9):672. doi: 10.3390/genes10090672.

A Fast Linear Neighborhood Similarity-Based Network Link Inference Method to Predict MicroRNA-Disease Associations.

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):405-415. doi: 10.1109/TCBB.2019.2931546. Epub 2021 Apr 6.

Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation.

Mol Ther Nucleic Acids. 2019 Jun 7;16:733-744. doi: 10.1016/j.omtn.2019.04.019. Epub 2019 Apr 30.

Incorporating Distance-Based Top-n-gram and Random Forest To Identify Electron Transport Proteins.

J Proteome Res. 2019 Jul 5;18(7):2931-2939. doi: 10.1021/acs.jproteome.9b00250. Epub 2019 Jun 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于特征融合与优化协议的DNA N6-甲基腺嘌呤修饰预测的生物信息学工具。

A Bioinformatics Tool for the Prediction of DNA N6-Methyladenine Modifications Based on Feature Fusion and Optimization Protocol.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献