Department of Physiology, Ajou University School of Medicine, Republic of Korea.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa304.
Deoxyribonucleic acid replication is one of the most crucial tasks taking place in the cell, and it has to be precisely regulated. This process is initiated in the replication origins (ORIs), and thus it is essential to identify such sites for a deeper understanding of the cellular processes and functions related to the regulation of gene expression. Considering the important tasks performed by ORIs, several experimental and computational approaches have been developed in the prediction of such sites. However, existing computational predictors for ORIs have certain curbs, such as building only single-feature encoding models, limited systematic feature engineering efforts and failure to validate model robustness. Hence, we developed a novel species-specific yeast predictor called yORIpred that accurately identify ORIs in the yeast genomes. To develop yORIpred, we first constructed optimal 40 baseline models by exploring eight different sequence-based encodings and five different machine learning classifiers. Subsequently, the predicted probability of 40 models was considered as the novel feature vector and carried out iterative feature learning approach independently using five different classifiers. Our systematic analysis revealed that the feature representation learned by the support vector machine algorithm (yORIpred) could well discriminate the distribution characteristics between ORIs and non-ORIs when compared with the other four algorithms. Comprehensive benchmarking experiments showed that yORIpred achieved superior and stable performance when compared with the existing predictors on the same training datasets. Furthermore, independent evaluation showcased the best and accurate performance of yORIpred thus underscoring the significance of iterative feature representation. To facilitate the users in obtaining their desired results without undergoing any mathematical, statistical or computational hassles, we developed a web server for the yORIpred predictor, which is available at: http://thegleelab.org/yORIpred.
脱氧核糖核酸复制是细胞中进行的最重要的任务之一,必须进行精确的调控。这个过程从复制起点(ORIs)开始,因此,识别这些位点对于深入了解与基因表达调控相关的细胞过程和功能至关重要。考虑到 ORIs 执行的重要任务,已经开发了几种实验和计算方法来预测这些位点。然而,现有的 ORIs 计算预测器存在某些限制,例如仅构建单特征编码模型、系统特征工程工作有限以及未能验证模型稳健性。因此,我们开发了一种新的物种特异性酵母预测器,称为 yORIpred,可准确识别酵母基因组中的 ORIs。为了开发 yORIpred,我们首先通过探索八种不同的基于序列的编码和五种不同的机器学习分类器来构建最佳的 40 个基线模型。随后,将 40 个模型的预测概率作为新的特征向量,并使用五种不同的分类器独立进行迭代特征学习方法。我们的系统分析表明,与其他四种算法相比,支持向量机算法(yORIpred)学习的特征表示可以很好地区分 ORIs 和非 ORIs 之间的分布特征。综合基准测试实验表明,与同一训练数据集上的现有预测器相比,yORIpred 具有优越和稳定的性能。此外,独立评估展示了 yORIpred 的最佳和准确性能,从而强调了迭代特征表示的重要性。为了方便用户在不进行任何数学、统计或计算麻烦的情况下获得所需的结果,我们开发了一个 yORIpred 预测器的网络服务器,可在以下网址获得:http://thegleelab.org/yORIpred。