Laboratory of Forest Genomics, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, 660036 Krasnoyarsk, Russia.
Laboratory of Genomic Research and Biotechnology, Federal Research Center "Krasnoyarsk Science Center" Siberian Branch, Russian Academy of Sciences, 660036 Krasnoyarsk, Russia.
Int J Mol Sci. 2022 Feb 3;23(3):1735. doi: 10.3390/ijms23031735.
The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.
启动子的鉴定是基因组注释过程中的一个重要步骤,为基因调控网络及其在转录调控中的作用提供了框架。尽管在高通量确定转录起始位点(TSS)和转录因子结合位点(TFBS)方面取得了相当大的进展,但实验方法仍然耗时且昂贵。相反,已经开发了几种计算方法,以提供快速可靠的方法来预测基因组范围内 TSS 和调控基序的位置。已经对哺乳动物基因组的调控元件进行了大量研究,但植物启动子,特别是裸子植物的启动子,一直没有受到关注,因此研究得很少。本研究的目的是使用计算方法增强和扩展现有的基因组注释,以在四种针叶树物种:火炬松、白云杉、挪威云杉和西伯利亚落叶松中进行全基因组 TSS 预测。我们的管道将有助于其他基因组的 TSS 预测,特别是对于草稿组装,通常无法获得可靠的 TSS 预测。我们还探索了预测启动子的核苷酸组成的一些特征,并将针叶树基因的 GC 特性与模式单子叶植物和双子叶植物进行了比较。在这里,我们证明即使是不完整的基因组组装和部分注释也可以作为 TSS 注释的可靠起点。在四种针叶树物种中的 TSS 预测结果已被存入 Persephone 基因组浏览器中,该浏览器允许平滑可视化,并且针对大数据集进行了优化。这项工作为未来的实验验证和调控区域的研究提供了初步基础,以了解裸子植物中的基因调控。