Suppr超能文献

i6mA-Vote:基于投票集成学习的植物基因组中DNA N6-甲基腺嘌呤位点的跨物种鉴定

i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting.

作者信息

Teng Zhixia, Zhao Zhengnan, Li Yanjuan, Tian Zhen, Guo Maozu, Lu Qianzi, Wang Guohua

机构信息

College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.

College of Electrical and Information Engineering, Quzhou University, Quzhou, China.

出版信息

Front Plant Sci. 2022 Feb 14;13:845835. doi: 10.3389/fpls.2022.845835. eCollection 2022.

Abstract

DNA N6-Methyladenine (6mA) is a common epigenetic modification, which plays some significant roles in the growth and development of plants. It is crucial to identify 6mA sites for elucidating the functions of 6mA. In this article, a novel model named i6mA-vote is developed to predict 6mA sites of plants. Firstly, DNA sequences were coded into six feature vectors with diverse strategies based on density, physicochemical properties, and position of nucleotides, respectively. To find the best coding strategy, the feature vectors were compared on several machine learning classifiers. The results suggested that the position of nucleotides has a significant positive effect on 6mA sites identification. Thus, the dinucleotide one-hot strategy which can describe position characteristics of nucleotides well was employed to extract DNA features in our method. Secondly, DNA sequences of Rosaceae were divided into a training dataset and a test dataset randomly. Finally, i6mA-vote was constructed by combining five different base-classifiers under a majority voting strategy and trained on the Rosaceae training dataset. The i6mA-vote was evaluated on the task of predicting 6mA sites from the genome of the Rosaceae, Rice, and Arabidopsis separately. In Rosaceae, the performances of i6mA-vote were 0.955 on accuracy (ACC), 0.909 on Matthew correlation coefficients (MCC), 0.955 on sensitivity (SN), and 0.954 on specificity (SP). Those indicators, in the order of ACC, MCC, SN, SP, were 0.882, 0.774, 0.961, and 0.803 on Rice while they were 0.798, 0.617, 0.666, and 0.929 on Arabidopsis. According to the indicators, our method was effectiveness and better than other concerned methods. The results also illustrated that i6mA-vote does not only well in 6mA sites prediction of intraspecies but also interspecies plants. Moreover, it can be seen that the specificity is distinctly lower than the sensitivity in Rice while it is just the opposite in Arabidopsis. It may be resulted from sequence similarity among Rosaceae, Rice and Arabidopsis.

摘要

DNA N6-甲基腺嘌呤(6mA)是一种常见的表观遗传修饰,在植物的生长发育中发挥着重要作用。识别6mA位点对于阐明6mA的功能至关重要。在本文中,开发了一种名为i6mA-vote的新模型来预测植物的6mA位点。首先,DNA序列分别基于密度、物理化学性质和核苷酸位置,采用不同策略编码为六个特征向量。为了找到最佳编码策略,在几个机器学习分类器上对特征向量进行了比较。结果表明,核苷酸位置对6mA位点识别有显著的正向影响。因此,在我们的方法中采用了能很好描述核苷酸位置特征的二核苷酸独热策略来提取DNA特征。其次,蔷薇科植物的DNA序列被随机分为训练数据集和测试数据集。最后,通过在多数投票策略下组合五个不同的基分类器构建了i6mA-vote,并在蔷薇科训练数据集上进行训练。分别在从蔷薇科、水稻和拟南芥基因组预测6mA位点的任务中对i6mA-vote进行了评估。在蔷薇科中,i6mA-vote的准确率(ACC)为0.955,马修相关系数(MCC)为0.909,灵敏度(SN)为0.955,特异性(SP)为0.954。在水稻上,这些指标按ACC、MCC、SN、SP的顺序分别为0.882、0.774、0.961和0.803,而在拟南芥上分别为0.798、0.617、0.666和0.929。根据这些指标,我们的方法是有效的,并且优于其他相关方法。结果还表明,i6mA-vote不仅在种内6mA位点预测中表现良好,在种间植物中也表现良好。此外,可以看出,在水稻中特异性明显低于灵敏度,而在拟南芥中则相反。这可能是由于蔷薇科、水稻和拟南芥之间的序列相似性导致的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f3e/8882731/b9d2cb1e8873/fpls-13-845835-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验