Suppr超能文献

EasyCluster2:一种改进的长转录本读长聚类和组装工具。

EasyCluster2: an improved tool for clustering and assembling long transcriptome reads.

出版信息

BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S7. doi: 10.1186/1471-2105-15-S15-S7. Epub 2014 Dec 3.

Abstract

BACKGROUND

Expressed sequences (e.g. ESTs) are a strong source of evidence to improve gene structures and predict reliable alternative splicing events. When a genome assembly is available, ESTs are suitable to generate gene-oriented clusters through the well-established EasyCluster software. Nowadays, EST-like sequences can be massively produced using Next Generation Sequencing (NGS) technologies. In order to handle genome-scale transcriptome data, we present here EasyCluster2, a reimplementation of EasyCluster able to speed up the creation of gene-oriented clusters and facilitate downstream analyses as the assembly of full-length transcripts and the detection of splicing isoforms.

RESULTS

EasyCluster2 has been developed to facilitate the genome-based clustering of EST-like sequences generated through the NGS 454 technology. Reads mapped onto the reference genome can be uploaded using the standard GFF3 file format. Alignment parsing is initially performed to produce a first collection of pseudo-clusters by grouping reads according to the overlap of their genomic coordinates on the same strand. EasyCluster2 then refines read grouping by including in each cluster only reads sharing at least one splice site and optionally performs a Smith-Waterman alignment in the region surrounding splice sites in order to correct for potential alignment errors. In addition, EasyCluster2 can include unspliced reads, which generally account for >50% of 454 datasets, and collapses overlapping clusters. Finally, EasyCluster2 can assemble full-length transcripts using a Directed-Acyclic-Graph-based strategy, simplifying the identification of alternative splicing isoforms, thanks also to the implementation of the widespread AStalavista methodology. Accuracy and performances have been tested on real as well as simulated datasets.

CONCLUSIONS

EasyCluster2 represents a unique tool to cluster and assemble transcriptome reads produced with 454 technology, as well as ESTs and full-length transcripts. The clustering procedure is enhanced with the employment of genome annotations and unspliced reads. Overall, EasyCluster2 is able to perform an effective detection of splicing isoforms, since it can refine exon-exon junctions and explore alternative splicing without known reference transcripts. Results in GFF3 format can be browsed in the UCSC Genome Browser. Therefore, EasyCluster2 is a powerful tool to generate reliable clusters for gene expression studies, facilitating the analysis also to researchers not skilled in bioinformatics.

摘要

背景

表达序列(例如 EST)是改进基因结构和预测可靠的选择性剪接事件的有力证据来源。当基因组组装可用时,EST 适合通过成熟的 EasyCluster 软件生成面向基因的聚类。如今,使用下一代测序(NGS)技术可以大规模产生 EST 样序列。为了处理基因组规模的转录组数据,我们在这里介绍 EasyCluster2,它是 EasyCluster 的重新实现,能够加快面向基因的聚类的创建,并促进下游分析,如全长转录本的组装和剪接异构体的检测。

结果

EasyCluster2 是为了方便通过 NGS 454 技术生成的 EST 样序列的基于基因组的聚类而开发的。可以使用标准的 GFF3 文件格式上传映射到参考基因组的reads。首先进行对齐解析,通过根据同一链上的基因组坐标重叠将 reads 分组来生成第一个伪聚类集合。EasyCluster2 然后通过仅在每个聚类中包含至少一个剪接位点共享的 reads 并可选地在剪接位点周围的区域中执行 Smith-Waterman 比对来细化 read 分组,以纠正潜在的对齐错误。此外,EasyCluster2 可以包含未剪接的 reads,这些 reads 通常占 454 数据集的>50%,并合并重叠的聚类。最后,EasyCluster2 可以使用基于有向无环图的策略组装全长转录本,简化了对选择性剪接异构体的识别,这也要归功于广泛的 AStalavista 方法的实现。在真实和模拟数据集上测试了准确性和性能。

结论

EasyCluster2 是一种独特的工具,可用于聚类和组装使用 454 技术产生的转录组 reads、EST 和全长转录本。聚类过程通过使用基因组注释和未剪接的 reads 得到增强。总体而言,EasyCluster2 能够有效地检测剪接异构体,因为它可以细化外显子-内含子连接并探索未知参考转录本的替代剪接。以 GFF3 格式生成的结果可以在 UCSC 基因组浏览器中浏览。因此,EasyCluster2 是生成用于基因表达研究的可靠聚类的强大工具,即使对于不熟练生物信息学的研究人员也能简化分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/785d/4271567/251194d434bf/1471-2105-15-S15-S7-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验