Zeng Yu-Hao, Yin Zhen-Ning, Luo Hao, Gao Feng
Department of Physics, School of Science, Tianjin University, Tianjin 300072, China.
Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China.
Genomics Proteomics Bioinformatics. 2024 Dec 3;22(5). doi: 10.1093/gpbjnl/qzae076.
DNA replication is a complex and crucial biological process in eukaryotes. To facilitate the study of eukaryotic replication events, we present a database of eukaryotic DNA replication origins (DeOri), which collects genome-wide data on eukaryotic DNA replication origins currently available. With the rapid development of high-throughput experimental technology in recent years, the number of datasets in the new release of DeOri 10.0 increased from 10 to 151 and the number of sequences increased from 16,145 to 9,742,396. Besides nucleotide sequences and browser extensible data (BED) files, corresponding annotation files, such as coding sequences (CDSs), mRNAs, and other biological elements within replication origins, are also provided. The experimental techniques used for each dataset, as well as related statistical data, are also presented on web page. Differences in experimental methods, cell lines, and sequencing technologies have resulted in distinct replication origins, making it challenging to differentiate between cell-specific and non-specific replication origins. Based on multiple replication origin datasets at the species level, we scored and screened replication origins in Homo sapiens, Gallus gallus, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. The screened regions with high scores were considered as species-conservative origins, which are integrated and presented as reference replication origins (rORIs). Additionally, we analyzed the distribution of relevant genomic elements associated with replication origins at the genome level, such as CpG island (CGI), transcription start site (TSS), and G-quadruplex (G4). These analysis results can be browsed and downloaded as needed at http://tubic.tju.edu.cn/deori/.
DNA复制是真核生物中一个复杂且关键的生物学过程。为便于研究真核生物的复制事件,我们提供了一个真核生物DNA复制起点数据库(DeOri),该数据库收集了目前可用的全基因组范围的真核生物DNA复制起点数据。近年来,随着高通量实验技术的快速发展,DeOri 10.0新版本中的数据集数量从10个增加到151个,序列数量从16,145个增加到9,742,396个。除了核苷酸序列和浏览器可扩展数据(BED)文件外,还提供了相应的注释文件,如复制起点内的编码序列(CDS)、mRNA和其他生物元件。网页上还展示了每个数据集所使用的实验技术以及相关统计数据。实验方法、细胞系和测序技术的差异导致了不同的复制起点,这使得区分细胞特异性和非特异性复制起点具有挑战性。基于物种水平的多个复制起点数据集,我们对智人、原鸡、小家鼠、黑腹果蝇和秀丽隐杆线虫的复制起点进行了评分和筛选。筛选出的高分区域被视为物种保守起点,并整合呈现为参考复制起点(rORI)。此外,我们在基因组水平分析了与复制起点相关的相关基因组元件的分布,如CpG岛(CGI)、转录起始位点(TSS)和G-四链体(G4)。这些分析结果可在http://tubic.tju.edu.cn/deori/按需浏览和下载。