Ugra Research Institute of Information Technologies, Khanty-Mansiysk, Russia.
Vavilov Institute of General Genetics, Moscow, Russia.
Methods Mol Biol. 2021;2238:261-274. doi: 10.1007/978-1-0716-1068-8_17.
As the interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper, we present TransPrise-an efficient deep learning tool for predicting positions of eukaryotic transcription start sites. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise with the TSSPlant approach for well-annotated genome of Oryza sativa. Using a computer with a graphics processing unit, the run time of TransPrise is 250 min on a genome of 374 Mb long.We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all the necessary packages, models, and code as well as the source code of the TransPrise algorithm are available at http://compubioverne.group/ . The source code is ready to use and to be customized to predict TSS in any eukaryotic organism.
随着对基因重测序的兴趣日益增加,对有效数学、计算和统计方法的需求也越来越大。基因组注释中的一个难题是确定转录起始位点的精确位置。在本文中,我们提出了 TransPrise——一种用于预测真核转录起始位点位置的高效深度学习工具。TransPrise 提供了优于现有启动子预测方法的显著改进。为了说明这一点,我们将 TransPrise 的预测结果与 TSSPlant 方法在经过充分注释的 Oryza sativa 基因组上进行了比较。使用配备图形处理单元的计算机,TransPrise 在一个 374 Mb 长的基因组上的运行时间为 250 分钟。我们提供了进行比较的全部依据,并鼓励用户自由访问我们的一组计算工具,以方便和简化他们自己的分析。带有所有必要软件包、模型和代码的即用型 Docker 镜像以及 TransPrise 算法的源代码可在 http://compubioverne.group/ 获得。源代码可随时使用,并可针对任何真核生物进行 TSS 预测进行定制。