Minoche André E, Dohm Juliane C, Schneider Jessica, Holtgräwe Daniela, Viehöver Prisca, Montfort Magda, Sörensen Thomas Rosleff, Weisshaar Bernd, Himmelbauer Heinz
Max Planck Institute for Molecular Genetics, Berlin, Germany.
Centre for Genomic Regulation (CRG), Barcelona, Spain.
Genome Biol. 2015 Sep 2;16(1):184. doi: 10.1186/s13059-015-0729-7.
We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes.
我们开发了一种利用PacBio单分子实时(SMRT)cDNA reads预测和验证基因模型的方法。98%的全长SMRT reads跨越完整的开放阅读框。利用SMRT reads进行基因模型验证是作为一个自动化过程开发的。优化的训练和预测设置以及辅助Illumina reads的mRNA-seq降噪提高了基因预测的灵敏度和精度。此外,我们还提出了一个改良的甜菜(Beta vulgaris)基因集和第一个菠菜(Spinacia oleracea)全基因组基因集。该工作流程和指南是为非模式真核生物的新测序基因组获取全面基因集的宝贵资源。