School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China.
School of Science, Xi'an Polytechnic University, Xi'an, P. R. China.
SAR QSAR Environ Res. 2021 Apr;32(4):317-331. doi: 10.1080/1062936X.2021.1895884. Epub 2021 Mar 18.
DNA replication is not only the basis of biological inheritance but also the most fundamental process in all living organisms. It plays a crucial role in the cell-division cycle and gene expression regulation. Hence, the accurate identification of the origin of replication sites (ORIs) has a great meaning for further understanding the regulatory mechanism of gene expression and treating genic diseases. In this paper, a novel, feasible and powerful model, namely, iORI-ENST is designed for identifying ORIs. Firstly, we extract the different features by incorporating mono-nucleotide binary encoding and dinucleotide-based spatial autocorrelation. Subsequently, elastic net is utilized as the feature selection method to select the optimal feature set. And then stacking learning is employed to predict ORIs and non-ORIs, which contains random forest, adaboost, gradient boosting decision tree, extra trees and support vector machine. Finally, the ORI sites are identified on the benchmark datasets and with their accuracies of 91.41% and 95.07%, respectively. Meanwhile, an independent dataset is employed to verify the validation and transferability of our model and its accuracy reaches 91.10%. Comparing with state-of-the-art methods, our model achieves more remarkable performance. The results show our model is a feasible, effective and powerful tool for identifying ORIs. The source code and datasets are available at https://github.com/YingyingYao/iORI-ENST.
DNA 复制不仅是生物遗传的基础,也是所有生物中最基本的过程。它在细胞分裂周期和基因表达调控中起着至关重要的作用。因此,准确识别复制起始位点(ORIs)对于进一步了解基因表达的调控机制和治疗基因疾病具有重要意义。在本文中,我们设计了一种新颖、可行且强大的模型,即 iORI-ENST,用于识别 ORIs。首先,我们通过结合单核苷酸二进制编码和基于二核苷酸的空间自相关来提取不同的特征。然后,弹性网络被用作特征选择方法来选择最优的特征集。接着,堆叠学习被用于预测 ORIs 和非 ORIs,其中包含随机森林、adaboost、梯度提升决策树、极端随机树和支持向量机。最后,在基准数据集和上,我们的模型分别达到了 91.41%和 95.07%的准确率来识别 ORI 位点。同时,我们使用了一个独立数据集来验证我们模型的验证和可转移性,其准确率达到 91.10%。与最先进的方法相比,我们的模型取得了更显著的性能。结果表明,我们的模型是一种可行、有效且强大的识别 ORIs 的工具。源代码和数据集可在 https://github.com/YingyingYao/iORI-ENST 上获取。