Suppr超能文献

基于序列的深度学习模型在人类基因组中的启动子分析和预测。

Promoter analysis and prediction in the human genome using sequence-based deep learning models.

机构信息

Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.

Department of Cell Biology, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia.

出版信息

Bioinformatics. 2019 Aug 15;35(16):2730-2737. doi: 10.1093/bioinformatics/bty1068.

Abstract

MOTIVATION

Computational identification of promoters is notoriously difficult as human genes often have unique promoter sequences that provide regulation of transcription and interaction with transcription initiation complex. While there are many attempts to develop computational promoter identification methods, we have no reliable tool to analyze long genomic sequences.

RESULTS

In this work, we further develop our deep learning approach that was relatively successful to discriminate short promoter and non-promoter sequences. Instead of focusing on the classification accuracy, in this work we predict the exact positions of the transcription start site inside the genomic sequences testing every possible location. We studied human promoters to find effective regions for discrimination and built corresponding deep learning models. These models use adaptively constructed negative set, which iteratively improves the model's discriminative ability. Our method significantly outperforms the previously developed promoter prediction programs by considerably reducing the number of false-positive predictions. We have achieved error-per-1000-bp rate of 0.02 and have 0.31 errors per correct prediction, which is significantly better than the results of other human promoter predictors.

AVAILABILITY AND IMPLEMENTATION

The developed method is available as a web server at http://www.cbrc.kaust.edu.sa/PromID/.

摘要

动机

由于人类基因通常具有独特的启动子序列,这些序列提供转录调控和与转录起始复合物的相互作用,因此计算识别启动子非常困难。虽然有许多尝试开发计算启动子识别方法,但我们没有可靠的工具来分析长基因组序列。

结果

在这项工作中,我们进一步开发了我们的深度学习方法,该方法相对成功地区分了短启动子和非启动子序列。在这项工作中,我们不是专注于分类准确性,而是预测转录起始位点在基因组序列中的精确位置,测试每个可能的位置。我们研究了人类启动子,以找到用于区分的有效区域,并构建了相应的深度学习模型。这些模型使用自适应构建的负集,该负集迭代地提高模型的区分能力。我们的方法通过大大减少假阳性预测的数量,显著优于以前开发的启动子预测程序。我们实现了每 1000 个碱基对错误率为 0.02,每正确预测的错误率为 0.31,明显优于其他人类启动子预测器的结果。

可用性和实现

开发的方法可作为一个网络服务器在 http://www.cbrc.kaust.edu.sa/PromID/ 上使用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验