The Genome Institute, Washington University School of Medicine, 4444 Forest Park Boulevard, St, Louis, MO 63108, USA.
Parasit Vectors. 2014 Apr 1;7:151. doi: 10.1186/1756-3305-7-151.
Alternative splicing (AS) of mRNA is a vital mechanism for enhancing genomic complexity in eukaryotes. Spliced isoforms of the same gene can have diverse molecular and biological functions and are often differentially expressed across various tissues, times, and conditions. Thus, AS has important implications in the study of parasitic nematodes with complex life cycles. Transcriptomic datasets are available from many species, but data must be revisited with splice-aware assembly protocols to facilitate the study of AS in helminthes.
We sequenced cDNA from the model worm Caenorhabditis elegans using 454/Roche technology for use as an experimental dataset. Reads were assembled with Newbler software, invoking the cDNA option. Several combinations of parameters were tested and assembled transcripts were verified by comparison with previously reported C. elegans genes and transcript isoforms and with Illumina RNAseq data.
Thoughtful adjustment of program parameters increased the percentage of assembled transcripts that matched known C. elegans sequences, decreased mis-assembly rates (i.e., cis- and trans-chimeras), and improved the coverage of the geneset. The optimized protocol was used to update de novo transcriptome assemblies from nine parasitic nematode species, including important pathogens of humans and domestic animals. Our assemblies indicated AS rates in the range of 20-30%, typically with 2-3 transcripts per AS locus, depending on the species. Transcript isoforms from the nine species were translated and searched for similarity to known proteins and functional domains. Some 21 InterPro domains, including several involved in nucleotide and chromatin binding, were statistically correlated with AS genetic loci. In most cases, the Roche/454 data explored in this study are the only sequences available from the species in question; however, the recently published genome of the human hookworm Necator americanus provided an additional opportunity to validate our results.
Our optimized assembly parameters facilitated the first survey of AS among parasitic nematodes. The nine transcriptome assemblies, their protein translations, and basic annotations are available from Nematode.net as a resource for the research community. These should be useful for studies of specific genes and gene families of interest as well as for curating draft genome assemblies as they become available.
mRNA 的可变剪接(AS)是真核生物增强基因组复杂性的重要机制。同一基因的剪接异构体具有不同的分子和生物学功能,并且通常在不同的组织、时间和条件下差异表达。因此,AS 对研究具有复杂生命周期的寄生线虫具有重要意义。许多物种都有转录组数据集,但必须使用带有剪接感知组装协议的数据进行重新审查,以促进对蠕形动物的 AS 研究。
我们使用 454/Roche 技术对模式蠕虫秀丽隐杆线虫的 cDNA 进行测序,作为实验数据集。使用 Newbler 软件对读取进行组装,调用 cDNA 选项。测试了几种参数组合,并通过与先前报道的秀丽隐杆线虫基因和转录本异构体以及与 Illumina RNAseq 数据的比较来验证组装的转录本。
明智地调整程序参数可以提高与已知秀丽隐杆线虫序列匹配的组装转录本的百分比,降低错误组装率(即顺式和反式嵌合体),并提高基因集的覆盖率。优化后的方案用于更新来自 9 种寄生线虫物种的从头转录组组装,包括人类和家畜的重要病原体。我们的组装表明,AS 率在 20-30%之间,具体取决于物种,每个 AS 基因座通常有 2-3 个转录本。来自 9 个物种的转录本异构体被翻译并搜索与已知蛋白质和功能域的相似性。约 21 个 InterPro 结构域,包括几个涉及核苷酸和染色质结合的结构域,与 AS 遗传基因座呈统计学相关。在大多数情况下,本研究中探索的 Roche/454 数据是有关物种唯一可用的序列;然而,最近发表的人类钩虫 Necator americanus 基因组为验证我们的结果提供了另一个机会。
我们优化的组装参数促进了寄生线虫中 AS 的首次调查。九个转录组组装、它们的蛋白质翻译和基本注释可从 Nematode.net 获得,作为研究社区的资源。这些对于研究特定基因和基因家族以及在可用时整理草案基因组组装都将非常有用。