利用深度学习揭示复制的人类起源：准确预测和全面分析。

Unveiling human origins of replication using deep learning: accurate prediction and comprehensive analysis.

机构信息

Department of Physics, School of Science, Tianjin University, Tianjin 300072, China.

Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China.

出版信息

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad432.

DOI:10.1093/bib/bbad432

PMID:38008420

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10676776/

Abstract

Accurate identification of replication origins (ORIs) is crucial for a comprehensive investigation into the progression of human cell growth and cancer therapy. Here, we proposed a computational approach Ori-FinderH, which can efficiently and precisely predict the human ORIs of various lengths by combining the Z-curve method with deep learning approach. Compared with existing methods, Ori-FinderH exhibits superior performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.9616 for K562 cell line in 10-fold cross-validation. In addition, we also established a cross-cell-line predictive model, which yielded a further improved AUC of 0.9706. The model was subsequently employed as a fitness function to support genetic algorithm for generating artificial ORIs. Sequence analysis through iORI-Euk revealed that a vast majority of the created sequences, specifically 98% or more, incorporate at least one ORI for three cell lines (Hela, MCF7 and K562). This innovative approach could provide more efficient, accurate and comprehensive information for experimental investigation, thereby further advancing the development of this field.

摘要

准确识别复制起始点（ORIs）对于全面研究人类细胞生长和癌症治疗至关重要。在这里，我们提出了一种计算方法 Ori-FinderH，该方法通过将 Z 曲线方法与深度学习方法相结合，可以高效、准确地预测各种长度的人类 ORIs。与现有方法相比，Ori-FinderH 表现出优越的性能，在 10 倍交叉验证中，K562 细胞系的接收者操作特征曲线（AUC）面积达到 0.9616。此外，我们还建立了一个跨细胞系预测模型，进一步提高了 AUC 至 0.9706。该模型随后被用作遗传算法的适应度函数，以支持生成人工 ORIs。iORI-Euk 的序列分析表明，生成的序列绝大多数（超过 98%）至少包含三个细胞系（Hela、MCF7 和 K562）的一个 ORI。这种创新方法可以为实验研究提供更高效、准确和全面的信息，从而进一步推动该领域的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4583/10676776/8f62d61c9dea/bbad432f1.jpg

相似文献

Unveiling human origins of replication using deep learning: accurate prediction and comprehensive analysis.

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad432.

A computational platform to identify origins of replication sites in eukaryotes.

Brief Bioinform. 2021 Mar 22;22(2):1940-1950. doi: 10.1093/bib/bbaa017.

PLANNER: a multi-scale deep language model for the origins of replication site prediction.

IEEE J Biomed Health Inform. 2024 Jan 4;PP. doi: 10.1109/JBHI.2024.3349584.

A deep learning framework combined with word embedding to identify DNA replication origins.

Sci Rep. 2021 Jan 12;11(1):844. doi: 10.1038/s41598-020-80670-x.

Accurate Identification of DNA Replication Origin by Fusing Epigenomics and Chromatin Interaction Information.

Research (Wash D C). 2022 Oct 29;2022:9780293. doi: 10.34133/2022/9780293. eCollection 2022.

ORI-Explorer: a unified cell-specific tool for origin of replication sites prediction by feature fusion.

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad664.

ORI-Deep: improving the accuracy for predicting origin of replication sites by using a blend of features and long short-term memory network.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac001.

iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning.

SAR QSAR Environ Res. 2021 Apr;32(4):317-331. doi: 10.1080/1062936X.2021.1895884. Epub 2021 Mar 18.

Recent Advances on the Machine Learning Methods in Identifying DNA Replication Origins in Eukaryotic Genomics.

Front Genet. 2018 Dec 10;9:613. doi: 10.3389/fgene.2018.00613. eCollection 2018.

Computational prediction of species-specific yeast DNA replication origin via iterative feature representation.

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa304.

引用本文的文献

DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models.

Front Med (Lausanne). 2025 Apr 8;12:1503229. doi: 10.3389/fmed.2025.1503229. eCollection 2025.

Nmix: a hybrid deep learning model for precise prediction of 2'-O-methylation sites based on multi-feature fusion and ensemble learning.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae601.

DeOri 10.0: An Updated Database of Experimentally Identified Eukaryotic Replication Origins.

Genomics Proteomics Bioinformatics. 2024 Dec 3;22(5). doi: 10.1093/gpbjnl/qzae076.

本文引用的文献

Designing catalysts with deep generative models and computational data. A case study for Suzuki cross coupling reactions.

Digit Discov. 2023 Apr 17;2(3):728-735. doi: 10.1039/d2dd00125j. eCollection 2023 Jun 12.

De novo design of luciferases using deep learning.

Nature. 2023 Feb;614(7949):774-780. doi: 10.1038/s41586-023-05696-3. Epub 2023 Feb 22.

Accurate Identification of DNA Replication Origin by Fusing Epigenomics and Chromatin Interaction Information.

Research (Wash D C). 2022 Oct 29;2022:9780293. doi: 10.34133/2022/9780293. eCollection 2022.

DoriC 12.0: an updated database of replication origins in both complete and draft prokaryotic genomes.

Nucleic Acids Res. 2023 Jan 6;51(D1):D117-D120. doi: 10.1093/nar/gkac964.

Ori-Finder 2022: A Comprehensive Web Server for Prediction and Analysis of Bacterial Replication Origins.

Genomics Proteomics Bioinformatics. 2022 Dec;20(6):1207-1213. doi: 10.1016/j.gpb.2022.10.002. Epub 2022 Oct 17.

Nucleotide spacing distribution analysis for human genome.

Mamm Genome. 2021 Apr;32(2):123-128. doi: 10.1007/s00335-021-09865-5. Epub 2021 Mar 15.

A review on genetic algorithm: past, present, and future.

Multimed Tools Appl. 2021;80(5):8091-8126. doi: 10.1007/s11042-020-10139-6. Epub 2020 Oct 31.

Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework.

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa275.

Synthetic promoter design in Escherichia coli based on a deep generative network.

Nucleic Acids Res. 2020 Jul 9;48(12):6403-6412. doi: 10.1093/nar/gkaa325.

Attention in Psychology, Neuroscience, and Machine Learning.

Front Comput Neurosci. 2020 Apr 16;14:29. doi: 10.3389/fncom.2020.00029. eCollection 2020.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用深度学习揭示复制的人类起源：准确预测和全面分析。

Unveiling human origins of replication using deep learning: accurate prediction and comprehensive analysis.

机构信息

Department of Physics, School of Science, Tianjin University, Tianjin 300072, China.

Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China.

出版信息

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad432.

DOI:10.1093/bib/bbad432

PMID:38008420

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10676776/

Abstract

摘要

利用深度学习揭示复制的人类起源：准确预测和全面分析。

Unveiling human origins of replication using deep learning: accurate prediction and comprehensive analysis.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

利用深度学习揭示复制的人类起源：准确预测和全面分析。

Unveiling human origins of replication using deep learning: accurate prediction and comprehensive analysis.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献