Suppr超能文献

Gene2DGE:一个用数字基因表达数据进行基因模型更新的 Perl 包。

Gene2DGE: a Perl package for gene model renewal with digital gene expression data.

机构信息

Faculty of Basic Medical Science, Nanchang University, Nanchang 330006, China.

出版信息

Genomics Proteomics Bioinformatics. 2012 Feb;10(1):51-4. doi: 10.1016/S1672-0229(11)60033-8.

Abstract

For transcriptome analysis, it is critical to precisely define all the transcripts across the whole genome. More and more digital gene expression (DGE) scannings have indicated the presence of huge amount of novel transcripts in addition to the known gene models. However, almost all these studies still depend crucially on existing annotation. Here, we present Gene2DGE, a Perl software package for gene model renewal with DGE data. We applied Gene2DGE to the mouse blastomere transcriptome, and defined 98,532 read-enriched regions (RERs) by read clustering supported by more than four reads for each base pair. Taking advantage of this ab initio method, we refined 2,104 exonic regions (4% of a total of 48,501 annotated transcribed regions) with remarkable extension into un-annotated regions (>50 bp). For 5% of uniquely mapped reads falling within intron regions, we identified 13,291 additional possible exons. As a result, we renewed 4,788 gene models, which account for 39% of a total of 12,277 transcribed genes. Furthermore, we identified 12,613 intergenic RERs, suggesting the possible presence of novel genes outside the existing gene models. In this study, therefore, we have developed a suitable tool for renewal of known gene models by ab initio prediction in transcriptome dissection. The Gene2DGE package is freely available at http://bighapmap.big.ac.cn/.

摘要

对于转录组分析,精确定义整个基因组中的所有转录本至关重要。越来越多的数字基因表达(DGE)扫描表明,除了已知的基因模型外,还存在大量的新转录本。然而,几乎所有这些研究仍然严重依赖于现有的注释。在这里,我们提出了 Gene2DGE,这是一个用于 DGE 数据的基因模型更新的 Perl 软件包。我们将 Gene2DGE 应用于小鼠胚胎期转录组,并通过每个碱基对支持超过四个读数的读数聚类定义了 98,532 个读取丰富区域(RER)。利用这种从头开始的方法,我们对 2,104 个外显子区域(占总共 48,501 个注释转录区域的 4%)进行了显著扩展到未注释区域(>50 bp)的改进。对于唯一映射的读取中落入内含子区域的 5%,我们确定了 13,291 个额外的可能外显子。结果,我们更新了 4,788 个基因模型,占总共 12,277 个转录基因的 39%。此外,我们还鉴定了 12,613 个基因间 RER,这表明在现有基因模型之外可能存在新的基因。因此,在本研究中,我们开发了一种合适的工具,用于通过从头预测在转录组剖析中更新已知的基因模型。Gene2DGE 软件包可在 http://bighapmap.big.ac.cn/ 免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d437/5054491/1f1776f4587b/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验