Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan.
Plant Mol Biol. 2020 May;103(1-2):225-234. doi: 10.1007/s11103-020-00988-y. Epub 2020 Mar 5.
DNA N-methyladenine (6 mA) is one of the most vital epigenetic modifications and involved in controlling the various gene expression levels. With the avalanche of DNA sequences generated in numerous databases, the accurate identification of 6 mA plays an essential role for understanding molecular mechanisms. Because the experimental approaches are time-consuming and costly, it is desirable to develop a computation model for rapidly and accurately identifying 6 mA. To the best of our knowledge, we first proposed a computational model named i6mA-Fuse to predict 6 mA sites from the Rosaceae genomes, especially in Rosa chinensis and Fragaria vesca. We implemented the five encoding schemes, i.e., mononucleotide binary, dinucleotide binary, k-space spectral nucleotide, k-mer, and electron-ion interaction pseudo potential compositions, to build the five, single-encoding random forest (RF) models. The i6mA-Fuse uses a linear regression model to combine the predicted probability scores of the five, single encoding-based RF models. The resultant species-specific i6mA-Fuse achieved remarkably high performances with AUCs of 0.982 and 0.978 and with MCCs of 0.869 and 0.858 on the independent datasets of Rosa chinensis and Fragaria vesca, respectively. In the F. vesca-specific i6mA-Fuse, the MBE and EIIP contributed to 75% and 25% of the total prediction; in the R. chinensis-specific i6mA-Fuse, Kmer, MBE, and EIIP contribute to 15%, 65%, and 20% of the total prediction. To assist high-throughput prediction for DNA 6 mA identification, the i6mA-Fuse is publicly accessible at https://kurata14.bio.kyutech.ac.jp/i6mA-Fuse/.
DNA N-甲基腺嘌呤(6mA)是最重要的表观遗传修饰之一,参与调控各种基因表达水平。随着大量数据库中生成的 DNA 序列的涌现,准确识别 6mA 对于理解分子机制至关重要。由于实验方法既耗时又昂贵,因此需要开发一种计算模型来快速准确地识别 6mA。据我们所知,我们首次提出了一种名为 i6mA-Fuse 的计算模型,用于从蔷薇科基因组中预测 6mA 位点,特别是在玫瑰和草莓中。我们实现了五种编码方案,即单核碱基二进位、双核碱基二进位、k-空间谱核苷酸、k--mer 和电子-离子相互作用伪势组合,构建了五种单编码随机森林(RF)模型。i6mA-Fuse 使用线性回归模型来结合五种基于单编码的 RF 模型的预测概率得分。在独立的玫瑰和草莓数据集上,物种特异性的 i6mA-Fuse 分别达到了惊人的高性能,AUC 分别为 0.982 和 0.978,MCC 分别为 0.869 和 0.858。在草莓特异性的 i6mA-Fuse 中,MBE 和 EIIP 分别贡献了 75%和 25%的总预测;在玫瑰特异性的 i6mA-Fuse 中,Kmer、MBE 和 EIIP 分别贡献了 15%、65%和 20%的总预测。为了协助高通量预测 DNA 6mA 识别,i6mA-Fuse 可在 https://kurata14.bio.kyutech.ac.jp/i6mA-Fuse/ 上公开获取。