Khan Salman, Uddin Islam, Noor Sumaiya, AlQahtani Salman A, Ahmad Nijad
Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA.
BMC Med Genomics. 2025 Mar 29;18(1):58. doi: 10.1186/s12920-025-02131-6.
N6-methyladenine (6 mA) is a pivotal DNA modification that plays a crucial role in epigenetic regulation, gene expression, and various biological processes. With advancements in sequencing technologies and computational biology, there is an increasing focus on developing accurate methods for 6 mA site identification to enhance early detection and understand its biological significance. Despite the rapid progress of machine learning in bioinformatics, accurately detecting 6 mA sites remains a challenge due to the limited generalizability and efficiency of existing approaches. In this study, we present Deep-N6mA, a novel Deep Neural Network (DNN) model incorporating optimal hybrid features for precise 6 mA site identification. The proposed framework captures complex patterns from DNA sequences through a comprehensive feature extraction process, leveraging k-mer, Dinucleotide-based Cross Covariance (DCC), Trinucleotide-based Auto Covariance (TAC), Pseudo Single Nucleotide Composition (PseSNC), Pseudo Dinucleotide Composition (PseDNC), and Pseudo Trinucleotide Composition (PseTNC). To optimize computational efficiency and eliminate irrelevant or noisy features, an unsupervised Principal Component Analysis (PCA) algorithm is employed, ensuring the selection of the most informative features. A multilayer DNN serves as the classification algorithm to identify N6-methyladenine sites accurately. The robustness and generalizability of Deep-N6mA were rigorously validated using fivefold cross-validation on two benchmark datasets. Experimental results reveal that Deep-N6mA achieves an average accuracy of 97.70% on the F. vesca dataset and 95.75% on the R. chinensis dataset, outperforming existing methods by 4.12% and 4.55%, respectively. These findings underscore the effectiveness of Deep-N6mA as a reliable tool for early 6 mA site detection, contributing to epigenetic research and advancing the field of computational biology.
N6-甲基腺嘌呤(6mA)是一种关键的DNA修饰,在表观遗传调控、基因表达和各种生物过程中发挥着至关重要的作用。随着测序技术和计算生物学的进步,人们越来越关注开发准确的6mA位点识别方法,以加强早期检测并了解其生物学意义。尽管机器学习在生物信息学方面取得了快速进展,但由于现有方法的通用性和效率有限,准确检测6mA位点仍然是一项挑战。在本研究中,我们提出了Deep-N6mA,这是一种新型的深度神经网络(DNN)模型,它结合了最优混合特征以精确识别6mA位点。所提出的框架通过全面的特征提取过程从DNA序列中捕获复杂模式,利用k-mer、基于二核苷酸的交叉协方差(DCC)、基于三核苷酸的自协方差(TAC)、伪单核苷酸组成(PseSNC)、伪二核苷酸组成(PseDNC)和伪三核苷酸组成(PseTNC)。为了优化计算效率并消除不相关或有噪声的特征,采用了无监督主成分分析(PCA)算法,确保选择最具信息性的特征。多层DNN用作分类算法以准确识别N6-甲基腺嘌呤位点。使用两个基准数据集进行五折交叉验证,对Deep-N6mA的稳健性和通用性进行了严格验证。实验结果表明,Deep-N6mA在野草莓数据集上的平均准确率达到97.70%,在中华猕猴桃数据集上达到95.75%,分别比现有方法高出4.12%和4.55%。这些发现强调了Deep-N6mA作为早期6mA位点检测可靠工具的有效性,有助于表观遗传学研究并推动计算生物学领域的发展。