• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过整合基于非序列的数据来增强基因组组装。

Enhancing genome assemblies by integrating non-sequence based data.

作者信息

Heider Thomas N, Lindsay James, Wang Chenwei, O'Neill Rachel J, Pask Andrew J

机构信息

Department of Molecular and Cellular Biology, University of Connecticut, 06269, Storrs CT, USA.

出版信息

BMC Proc. 2011 May 28;5 Suppl 2(Suppl 2):S7. doi: 10.1186/1753-6561-5-S2-S7.

DOI:10.1186/1753-6561-5-S2-S7
PMID:21554765
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3090765/
Abstract

INTRODUCTION

Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies.

METHODS

The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome.

RESULTS

Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total) but a 35% increase in N50 scaffold size over the use of sequence-based data alone.

CONCLUSIONS

We provide a relatively simple pipeline utilizing existing bioinformatics tools to integrate map data into a genome assembly which is available at http://www.mcb.uconn.edu/fac.php?name=paska. While the map data only contributed minimally to assigning the initial contigs to scaffolds in the new assembly, it greatly increased the N50 size. This process added structure to our low coverage assembly, greatly increasing its utility in further analyses.

摘要

引言

在高通量测序出现之前,许多基因组计划就已经在进行中,因此得到了来自其他技术的大量基因组信息的支持。此类信息通常以连锁图谱和物理图谱的形式呈现,这两种图谱都能提供大量对从头测序项目有用的数据。此外,近期丰富的基因组资源使得利用在相关物种中鉴定出的保守共线性图谱来进一步提升基因组组装成为可能。

方法

帚尾袋鼩(Macropus eugenii)是一种具有低覆盖度基因组的有袋类哺乳动物模型。然而,我们能够获取广泛的比较图谱,这些图谱包含通过保守基因座的物理定位、染色体涂染和综合连锁图谱构建的超过14000个标记。使用定制的Bioperl管道,通过BLAT将图谱中的信息与组装好的帚尾袋鼩重叠群进行比对。这些数据被用于构建间隔范围为5 - 10MB的伪双末端文库。然后我们使用Bambus(一个旨在通过利用双末端数据对重叠群进行排序和定向来搭建真核生物基因组支架的程序)来搭建我们的文库。为了确定图谱数据与基于序列的方法相比在提升组装效果方面如何,我们使用来自4KB和8KB Illumina双末端文库的0.5倍覆盖度的唯一读取片段重复了该实验。最后,我们将基于序列和非序列的数据结合起来,以确定一种组合方法如何能够进一步提升帚尾袋鼩基因组低覆盖度从头重建的质量。

结果

仅使用图谱数据,我们能够将2.2%的初始重叠群排列到支架中,并将N50支架大小增加到39KB(原始组装中为36KB)。仅使用基于0.5倍双末端序列的数据时,53%的初始重叠群被分配到支架中。将两个数据集结合起来,使得整合到支架中的初始重叠群数量进一步增加了2%(总计55%),但N50支架大小比仅使用基于序列的数据时增加了35%。

结论

我们提供了一个相对简单的管道,利用现有的生物信息学工具将图谱数据整合到基因组组装中,该管道可在http://www.mcb.uconn.edu/fac.php?name=paska获取。虽然图谱数据在新组装中对将初始重叠群分配到支架的贡献最小,但它极大地增加了N50大小。这个过程为我们的低覆盖度组装增添了结构,极大地提高了其在进一步分析中的实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a49/3090765/59a25b6dc4a3/1753-6561-5-S2-S7-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a49/3090765/f1147bc92558/1753-6561-5-S2-S7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a49/3090765/5fedbaa8e2ff/1753-6561-5-S2-S7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a49/3090765/59a25b6dc4a3/1753-6561-5-S2-S7-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a49/3090765/f1147bc92558/1753-6561-5-S2-S7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a49/3090765/5fedbaa8e2ff/1753-6561-5-S2-S7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a49/3090765/59a25b6dc4a3/1753-6561-5-S2-S7-3.jpg

相似文献

1
Enhancing genome assemblies by integrating non-sequence based data.通过整合基于非序列的数据来增强基因组组装。
BMC Proc. 2011 May 28;5 Suppl 2(Suppl 2):S7. doi: 10.1186/1753-6561-5-S2-S7.
2
A first-generation integrated tammar wallaby map and its use in creating a tammar wallaby first-generation virtual genome map.第一代塔斯马尼亚袋狸整合图谱及其在创建塔斯马尼亚袋狸第一代虚拟基因组图谱中的应用。
BMC Genomics. 2011 Aug 19;12:422. doi: 10.1186/1471-2164-12-422.
3
A second-generation anchored genetic linkage map of the tammar wallaby (Macropus eugenii).塔斯马尼亚袋狸(Macropus eugenii)的第二代锚定遗传连锁图谱。
BMC Genet. 2011 Aug 19;12:72. doi: 10.1186/1471-2156-12-72.
4
Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.病毒宏基因组组装中的碎片化和覆盖度变化,及其对多样性计算的影响。
Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.
5
An ultra-high density genetic linkage map of perennial ryegrass (Lolium perenne) using genotyping by sequencing (GBS) based on a reference shotgun genome assembly.基于参考鸟枪法基因组组装,利用简化基因组测序(GBS)构建的多年生黑麦草(Lolium perenne)超高密度遗传连锁图谱。
Ann Bot. 2016 Jul;118(1):71-87. doi: 10.1093/aob/mcw081. Epub 2016 Jun 6.
6
Reconstruction of the ancestral marsupial karyotype from comparative gene maps.从比较基因图谱重建祖先有袋动物的核型。
BMC Evol Biol. 2013 Nov 21;13:258. doi: 10.1186/1471-2148-13-258.
7
De novo assembly of the Indian blue peacock (Pavo cristatus) genome using Oxford Nanopore technology and Illumina sequencing.利用 Oxford Nanopore 技术和 Illumina 测序对印度蓝孔雀(Pavo cristatus)基因组进行从头组装。
Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz038.
8
A pilot study for channel catfish whole genome sequencing and de novo assembly.斑点叉尾鮰全基因组测序和从头组装的初步研究。
BMC Genomics. 2011 Dec 22;12:629. doi: 10.1186/1471-2164-12-629.
9
Whole Genome Profiling provides a robust framework for physical mapping and sequencing in the highly complex and repetitive wheat genome.全基因组分析为高度复杂和重复的小麦基因组的物理作图和测序提供了一个强大的框架。
BMC Genomics. 2012 Jan 30;13:47. doi: 10.1186/1471-2164-13-47.
10
Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods.利用二代测序(NGS)数据和半自动生物信息学方法改进香蕉“尖叶蕉(Musa acuminata)”参考序列
BMC Genomics. 2016 Mar 16;17:243. doi: 10.1186/s12864-016-2579-4.

引用本文的文献

1
Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development.澳大利亚袋鼠(Macropus eugenii)基因组序列为哺乳动物繁殖和发育的演化提供了新见解。
Genome Biol. 2011 Aug 29;12(8):R81. doi: 10.1186/gb-2011-12-8-r81.
2
A first-generation integrated tammar wallaby map and its use in creating a tammar wallaby first-generation virtual genome map.第一代塔斯马尼亚袋狸整合图谱及其在创建塔斯马尼亚袋狸第一代虚拟基因组图谱中的应用。
BMC Genomics. 2011 Aug 19;12:422. doi: 10.1186/1471-2164-12-422.

本文引用的文献

1
A first-generation integrated tammar wallaby map and its use in creating a tammar wallaby first-generation virtual genome map.第一代塔斯马尼亚袋狸整合图谱及其在创建塔斯马尼亚袋狸第一代虚拟基因组图谱中的应用。
BMC Genomics. 2011 Aug 19;12:422. doi: 10.1186/1471-2164-12-422.
2
Eggs, embryos and the evolution of imprinting: insights from the platypus genome.卵、胚胎与印记的进化:来自鸭嘴兽基因组的见解
Reprod Fertil Dev. 2009;21(8):935-42. doi: 10.1071/RD09092.
3
The evolution of class V POU domain transcription factors in vertebrates and their characterisation in a marsupial.
脊椎动物中V类POU结构域转录因子的进化及其在有袋动物中的特征描述。
Dev Biol. 2010 Jan 1;337(1):162-70. doi: 10.1016/j.ydbio.2009.10.017. Epub 2009 Oct 19.
4
Comparative analysis of the mammalian WNT4 promoter.哺乳动物WNT4启动子的比较分析
BMC Genomics. 2009 Sep 6;10:416. doi: 10.1186/1471-2164-10-416.
5
Evolution of genomic imprinting: insights from marsupials and monotremes.基因组印记的进化:有袋类动物和单孔目动物带来的启示
Annu Rev Genomics Hum Genet. 2009;10:241-62. doi: 10.1146/annurev-genom-082908-150026.
6
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。
Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.
7
Intergenic locations of rice centromeric chromatin.水稻着丝粒染色质的基因间位置
PLoS Biol. 2008 Nov 25;6(11):e286. doi: 10.1371/journal.pbio.0060286.
8
A new class of retroviral and satellite encoded small RNAs emanates from mammalian centromeres.一类新的逆转录病毒和卫星编码小RNA源自哺乳动物的着丝粒。
Chromosoma. 2009 Feb;118(1):113-25. doi: 10.1007/s00412-008-0181-5. Epub 2008 Oct 7.
9
Resurrection of DNA function in vivo from an extinct genome.已灭绝基因组在体内的DNA功能复活。
PLoS One. 2008 May 21;3(5):e2240. doi: 10.1371/journal.pone.0002240.
10
The delayed rise of present-day mammals.现代哺乳动物的延迟崛起。
Nature. 2007 Mar 29;446(7135):507-12. doi: 10.1038/nature05634.