文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

从基因组草图组装推断出的基因数量存在大量误差。

Extensive error in the number of genes inferred from draft genome assemblies.

作者信息

Denton James F, Lugo-Martinez Jose, Tucker Abraham E, Schrider Daniel R, Warren Wesley C, Hahn Matthew W

机构信息

School of Informatics and Computing, Indiana University, Bloomington, Indiana.

Department of Biology, Indiana University, Bloomington, Indiana.

出版信息

PLoS Comput Biol. 2014 Dec 4;10(12):e1003998. doi: 10.1371/journal.pcbi.1003998. eCollection 2014 Dec.


DOI:10.1371/journal.pcbi.1003998
PMID:25474019
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4256071/
Abstract

Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.

摘要

当前的测序方法会产生大量数据,但基于这些数据的基因组组装往往极不完整。这些不完整且充满错误的组装会导致许多注释错误,尤其是基因组中基因的数量。在本文中,我们从总基因数量以及特定基因家族中基因的拷贝数这两个方面来研究该问题的严重程度。为此,我们将多个草图组装与相同基因组的高质量版本进行比较,使用了基于传统和新一代测序技术的鸡基因组的几个新组装,以及已发表的黑猩猩草图组装。我们发现,在草图组装中,超过40%的基因家族被推断基因数量错误,而且这些错误的组装既有基因的增加也有基因的减少。使用黑腹果蝇的模拟基因组组装,我们发现草图基因组中基因数量增加的主要原因是基因断裂成多个单独的重叠群。最后,我们证明了RNA测序在改善草图组装的基因注释方面的有用性,主要是通过连接在组装过程中已断裂的基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/a30ebaf852b7/pcbi.1003998.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/79939d33e01c/pcbi.1003998.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/628d1dde9760/pcbi.1003998.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/342c3576d102/pcbi.1003998.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/209792444e1e/pcbi.1003998.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/a30ebaf852b7/pcbi.1003998.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/79939d33e01c/pcbi.1003998.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/628d1dde9760/pcbi.1003998.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/342c3576d102/pcbi.1003998.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/209792444e1e/pcbi.1003998.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0254/4256071/a30ebaf852b7/pcbi.1003998.g005.jpg

相似文献

[1]
Extensive error in the number of genes inferred from draft genome assemblies.

PLoS Comput Biol. 2014-12-4

[2]
Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.

Mol Biol Evol. 2013-5-24

[3]
Physical map-assisted whole-genome shotgun sequence assemblies.

Genome Res. 2006-6

[4]
Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps.

Genome Biol. 2010-4-13

[5]
CAMSA: a tool for comparative analysis and merging of scaffold assemblies.

BMC Bioinformatics. 2017-12-6

[6]
Resolving repeat families with long reads.

BMC Bioinformatics. 2019-5-9

[7]
A chromosomal genomics approach to assess and validate the desi and kabuli draft chickpea genome assemblies.

Plant Biotechnol J. 2014-4-5

[8]
Assembly reconciliation.

Bioinformatics. 2008-1-1

[9]
Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

Sci Rep. 2015-11-20

[10]
A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0).

Gigascience. 2017-11-1

引用本文的文献

[1]
Multi-metric locality sensitive hashing enhances alignment accuracy of bisulfite sequencing reads: BisHash.

Bioinform Adv. 2025-7-23

[2]
Characterization of a MERS-related betacoronavirus in Danish brown long-eared bats (Plecotus auritus).

Virol J. 2025-8-18

[3]
Genomic and secretomic analyses of Blastobotrys yeasts reveal key xylanases for biomass decomposition.

Appl Microbiol Biotechnol. 2025-8-1

[4]
CGC1, a new reference genome for .

Genome Res. 2025-8-1

[5]
Patterns of Gene Family Evolution and Selection Across .

Ecol Evol. 2025-5-24

[6]
Genome Evolution of Two Intertidal Sargassum Species (S. fusiforme and S. thunbergii) and Their Response to Abiotic Stressors.

Genome Biol Evol. 2025-4-30

[7]
Evaluating Genome Assemblies for Optimized Completeness and Accuracy of Reference Gene Sequences in Wheat, Rye, and Triticale.

Plants (Basel). 2025-4-6

[8]
Diploid chromosome-level genome assembly and annotation for Lycorma delicatula.

Sci Data. 2025-4-5

[9]
Persistent, Private, and Mobile Genes: A Model for Gene Dynamics in Evolving Pangenomes.

Mol Biol Evol. 2025-1-6

[10]
CGC1, a new reference genome for .

bioRxiv. 2024-12-6

本文引用的文献

[1]
Sequencing, assembling, and correcting draft genomes using recombinant populations.

G3 (Bethesda). 2014-4-16

[2]
Finding the missing honey bee genes: lessons learned from a genome upgrade.

BMC Genomics. 2014-1-30

[3]
Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars.

Nat Genet. 2013-10-27

[4]
Toward a statistically explicit understanding of de novo sequence assembly.

Bioinformatics. 2013-9-10

[5]
L_RNA_scaffolder: scaffolding genomes with transcripts.

BMC Genomics. 2013-9-8

[6]
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.

Gigascience. 2013-7-22

[7]
REAPR: a universal tool for genome assembly evaluation.

Genome Biol. 2013-5-27

[8]
Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.

Mol Biol Evol. 2013-5-24

[9]
FlyBase: improvements to the bibliography.

Nucleic Acids Res. 2012-11-3

[10]
A physical, genetic and functional sequence assembly of the barley genome.

Nature. 2012-10-17

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索