对非经典剪接位点的考虑改进了对拟南芥 Niederzenz-1 基因组序列的基因预测。

Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence.

作者信息

Pucker Boas, Holtgräwe Daniela, Weisshaar Bernd

机构信息

Faculty of Biology & Center for Biotechnology, Bielefeld University, Bielefeld, Germany.

出版信息

BMC Res Notes. 2017 Dec 4;10(1):667. doi: 10.1186/s13104-017-2985-y.

DOI:10.1186/s13104-017-2985-y

PMID:29202864

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5716242/

Abstract

OBJECTIVE

The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted gene set revealed some errors involving genes with non-canonical splice sites in their introns. Since non-canonical splice sites are difficult to predict ab initio, we checked for options to improve the annotation by transferring annotation information from the recently released Columbia-0 reference genome sequence annotation Araport11.

RESULTS

Incorporation of hints generated from Araport11 enabled the precise prediction of non-canonical splice sites. Manual inspection of RNA-Seq read mapping and RT-PCR were applied to validate the structural annotations of non-canonical splice sites. Predictions of untranslated regions were also updated by harnessing the potential of Araport11's information, which was generated by using high coverage RNA-Seq data. The improved gene set of the Nd-1 genome assembly (GeneSet_Nd-1_v1.1) was evaluated via comparison to the initial gene prediction (GeneSet_Nd-1_v1.0) as well as against Araport11 for the Col-0 reference genome sequence. GeneSet_Nd-1_v1.1 contains previously missed non-canonical splice sites in 1256 genes. Reciprocal best hits for 24,527 (89.4%) of all nuclear Col-0 genes against the GeneSet_Nd-1_v1.1 indicate a high gene prediction quality.

摘要

目的

拟南芥 Niederzenz-1 基因组序列最近已发表，并带有从头开始的基因预测。对预测的基因集进行深入分析后发现了一些错误，这些错误涉及内含子中具有非规范剪接位点的基因。由于非规范剪接位点难以从头开始预测，我们检查了通过从最近发布的哥伦比亚-0 参考基因组序列注释 Araport11 转移注释信息来改进注释的选项。

结果

纳入从 Araport11 生成的提示能够精确预测非规范剪接位点。应用 RNA-Seq 读段映射的人工检查和 RT-PCR 来验证非规范剪接位点的结构注释。还通过利用 Araport11 的信息潜力更新了非翻译区的预测，该信息是通过使用高覆盖度 RNA-Seq 数据生成的。通过与初始基因预测（GeneSet_Nd-1_v1.0）以及针对 Col-0 参考基因组序列的 Araport11 进行比较，评估了 Nd-1 基因组组装的改进基因集（GeneSet_Nd-1_v1.1）。GeneSet_Nd-1_v1.1 在 1256 个基因中包含先前遗漏的非规范剪接位点。所有核 Col-0 基因中的 24,527 个（89.4%）与 GeneSet_Nd-1_v1.1 的相互最佳匹配表明基因预测质量很高。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

对非经典剪接位点的考虑改进了对拟南芥 Niederzenz-1 基因组序列的基因预测。

Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence.

作者信息

机构信息

出版信息

OBJECTIVE

RESULTS

目的

结果

相似文献

引用本文的文献

本文引用的文献

对非经典剪接位点的考虑改进了对拟南芥 Niederzenz-1 基因组序列的基因预测。

Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence.

作者信息

机构信息

出版信息

OBJECTIVE

RESULTS

目的

结果

相似文献

引用本文的文献

本文引用的文献