Zou Hongliang, Yang Fan, Yin Zhijian
School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330003, China.
School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330003, China.
Biophys Chem. 2022 Feb;281:106717. doi: 10.1016/j.bpc.2021.106717. Epub 2021 Nov 14.
DNase I hypersensitive sites (DHSs) is important for identifying the location of gene regulatory elements, such as promoters, enhancers, silencers, and so on. Thus, it is crucial for discriminating DHSs from non-DHSs. Although some traditional methods, such as Southern blots and DNase-seq technique, have the ability to identify DHSs, these approaches are time-consuming, laborious, and expensive. To address these issues, researchers paid their attention on computational approaches. Therefore, in this study, we developed a novel predictor called iDHS-DT to identify DHSs. In this predictor, the DNA sequences were firstly denoted by physicochemical properties (PC) of DNA dinucleotide and trinucleotide. Then, three different descriptors, including auto-covariance, cross-covariance, and discrete wavelet transform were used to collect related features from the PC matrix. Next, the least absolute shrinkage and selection operator (LASSO) algorithm was employed to remove these irrelevant and redundant features. Finally, these selected features were fed into support vector machine (SVM) for distinguishing DHSs from non-DHSs. The proposed method achieved 97.64% and 98.22% classification accuracy on dataset S and S, respectively. Compared with the existing predictors, our proposed model has significantly improvement in classification performance. Experimental results demonstrated that the proposed method is powerful in identifying DHSs.
脱氧核糖核酸酶I超敏位点(DHSs)对于识别基因调控元件的位置很重要,如启动子、增强子、沉默子等。因此,区分DHSs和非DHSs至关重要。尽管一些传统方法,如Southern印迹法和DNase-seq技术,有能力识别DHSs,但这些方法耗时、费力且昂贵。为了解决这些问题,研究人员将注意力转向了计算方法。因此,在本研究中,我们开发了一种名为iDHS-DT的新型预测器来识别DHSs。在这个预测器中,DNA序列首先由DNA二核苷酸和三核苷酸的物理化学性质(PC)表示。然后,使用三种不同的描述符,包括自协方差、互协方差和离散小波变换,从PC矩阵中收集相关特征。接下来,采用最小绝对收缩和选择算子(LASSO)算法去除这些不相关和冗余的特征。最后,将这些选定的特征输入支持向量机(SVM)以区分DHSs和非DHSs。所提出的方法在数据集S和S上分别达到了97.64%和98.22%的分类准确率。与现有的预测器相比,我们提出的模型在分类性能上有显著提高。实验结果表明,所提出的方法在识别DHSs方面很强大。