Suppr超能文献

用于下一代测序分析的恒河猴新基因组组装与注释。

A new rhesus macaque assembly and annotation for next-generation sequencing analyses.

作者信息

Zimin Aleksey V, Cornish Adam S, Maudhoo Mnirnal D, Gibbs Robert M, Zhang Xiongfei, Pandey Sanjit, Meehan Daniel T, Wipfler Kristin, Bosinger Steven E, Johnson Zachary P, Tharp Gregory K, Marçais Guillaume, Roberts Michael, Ferguson Betsy, Fox Howard S, Treangen Todd, Salzberg Steven L, Yorke James A, Norgren Robert B

机构信息

Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, Nebraska 68198, USA.

出版信息

Biol Direct. 2014 Oct 14;9(1):20. doi: 10.1186/1745-6150-9-20.

Abstract

BACKGROUND

The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.

RESULTS

We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.

CONCLUSIONS

The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.

REVIEWERS

This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.

摘要

背景

恒河猴(猕猴)是推动生物医学研究的关键物种。与所有哺乳动物基因组草图一样,恒河猴基因组草图(rheMac2)存在缺口、测序错误和组装错误,这使得自动化注释流程无法正常运行。另一个恒河猴基因组组装版本CR_1.0也已可用,但它比rheMac2更加碎片化,其重叠群和支架更小。这两个组装版本的注释在完整性和准确性方面都很有限。包括表达、遗传和进化分析在内的广泛研究都需要高质量的组装和注释文件。

结果

我们报告了一种新的恒河猴基因组从头组装版本(MacaM),它整合了用于组装rheMac2的原始桑格测序序列以及来自同一只动物的新的Illumina测序序列。MacaM的加权平均重叠群大小(N50)为64千碱基,是rheMac2组装版本大小的两倍多,几乎是CR_1.0组装版本大小的五倍。MacaM染色体组装整合了先前未使用的图谱数据和支架的初步注释信息。使用离子激流测序读数比对对这些组装版本进行的独立评估表明,MacaM比rheMac2和CR_1.0更完整、更准确。我们将来自恒河猴多个组织的信使RNA序列组装成转录本,这使我们能够识别出总共11712个完整蛋白质,代表9524个不同的基因。通过结合我们组装的恒河猴转录本和人类转录本,我们在MacaM组装版本中注释了18757个转录本和16050个具有完整编码序列的基因。此外,我们证明与rheMac2的当前注释相比,新的注释在准确性上有了极大提高。最后,我们表明MacaM基因组为RNA序列表达研究产生的读数比对提供了准确的资源。

结论

与rheMac2或CR_1.0相比,MacaM组装版本和注释文件更完整、更准确地呈现了恒河猴基因组,将成为研究人员进行非人类灵长类动物下一代测序研究的重要资源。

评审人

本文由Lutz Walter博士、Soojin Yi博士和Kateryna Makova博士评审。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cbe/4214606/6538b643efa4/1745-6150-9-20-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验