Suppr超能文献

雪鸮:通过使用RNA测序和同源性信息在从头预测模型中进行选择来准确预测真菌基因。

SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models.

作者信息

Reid Ian, O'Toole Nicholas, Zabaneh Omar, Nourzadeh Reza, Dahdouli Mahmoud, Abdellateef Mostafa, Gordon Paul M K, Soh Jung, Butler Gregory, Sensen Christoph W, Tsang Adrian

机构信息

Centre for Structural and Functional Genomics, Concordia University, 7141 Sherbrooke St, W, Montreal, QC H4B 1R6, Canada.

出版信息

BMC Bioinformatics. 2014 Jul 1;15:229. doi: 10.1186/1471-2105-15-229.

Abstract

BACKGROUND

Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them.

RESULTS

SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data.

CONCLUSIONS

SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/.

摘要

背景

在新基因组中定位蛋白质编码基因对于理解和利用基因组信息至关重要,但准确预测所有基因仍然困难。最近通过信使核糖核酸高通量测序(RNA测序)获得的详细转录本结构信息描绘了许多表达基因,并有望提高基因预测的准确性。计算基因预测器已针对经过充分研究的动物基因组进行了深入开发和测试。现在或不久将对数百个真菌基因组进行测序。真菌基因组与动物基因组的差异以及经过充分研究的真菌在系统发育上的稀疏性,需要专门为它们量身定制的基因预测工具。

结果

SnowyOwl是一种新的基因预测流程,它使用RNA测序数据来训练并为基于隐马尔可夫模型(HMM)的基因预测生成提供提示,并评估所得模型。通过将其预测结果与三个真菌基因组中人工策划的基因模型进行比较,对该流程进行了开发和优化,并针对粗糙脉孢菌的高质量基因注释进行了验证;SnowyOwl预测粗糙脉孢菌基因的灵敏度为83%,特异性为65%。SnowyOwl通过使用不同输入参数反复运行HMM基因预测器Augustus来提高灵敏度,并通过选择与已知蛋白质具有最佳同源性且与RNA测序数据最相符的模型来提高选择性。

结论

SnowyOwl有效地利用RNA测序数据在经过充分研究的和新的真菌基因组中生成准确的基因模型。SnowyOwl流程的源代码(用Python编写)和一个网页界面(用PHP编写)可从http://sourceforge.net/projects/snowyowl/免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58e9/4084796/ff72e7828921/1471-2105-15-229-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验