Department of Physics, School of Science, Tianjin University, Tianjin 300072, China.
Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China.
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad432.
Accurate identification of replication origins (ORIs) is crucial for a comprehensive investigation into the progression of human cell growth and cancer therapy. Here, we proposed a computational approach Ori-FinderH, which can efficiently and precisely predict the human ORIs of various lengths by combining the Z-curve method with deep learning approach. Compared with existing methods, Ori-FinderH exhibits superior performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.9616 for K562 cell line in 10-fold cross-validation. In addition, we also established a cross-cell-line predictive model, which yielded a further improved AUC of 0.9706. The model was subsequently employed as a fitness function to support genetic algorithm for generating artificial ORIs. Sequence analysis through iORI-Euk revealed that a vast majority of the created sequences, specifically 98% or more, incorporate at least one ORI for three cell lines (Hela, MCF7 and K562). This innovative approach could provide more efficient, accurate and comprehensive information for experimental investigation, thereby further advancing the development of this field.
准确识别复制起始点(ORIs)对于全面研究人类细胞生长和癌症治疗至关重要。在这里,我们提出了一种计算方法 Ori-FinderH,该方法通过将 Z 曲线方法与深度学习方法相结合,可以高效、准确地预测各种长度的人类 ORIs。与现有方法相比,Ori-FinderH 表现出优越的性能,在 10 倍交叉验证中,K562 细胞系的接收者操作特征曲线(AUC)面积达到 0.9616。此外,我们还建立了一个跨细胞系预测模型,进一步提高了 AUC 至 0.9706。该模型随后被用作遗传算法的适应度函数,以支持生成人工 ORIs。iORI-Euk 的序列分析表明,生成的序列绝大多数(超过 98%)至少包含三个细胞系(Hela、MCF7 和 K562)的一个 ORI。这种创新方法可以为实验研究提供更高效、准确和全面的信息,从而进一步推动该领域的发展。