Suppr超能文献

DNA序列中的重复计数分布。

Duplication count distributions in DNA sequences.

作者信息

Sindi Suzanne S, Hunt Brian R, Yorke James A

机构信息

Institute for Physical Sciences and Technology, University of Maryland, College Park, Maryland 20742, USA.

出版信息

Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Dec;78(6 Pt 1):061912. doi: 10.1103/PhysRevE.78.061912. Epub 2008 Dec 11.

Abstract

We study quantitative features of complex repetitive DNA in several genomes by studying sequences that are sufficiently long that they are unlikely to have repeated by chance. For each genome we study, we determine the number of identical copies, the "duplication count," of each sequence of length 40, that is of each "40-mer." We say a 40-mer is "repeated" if its duplication count is at least 2. We focus mainly on "complex" 40-mers, those without short internal repetitions. We find that we can classify most of the complex repeated 40-mers into two categories: one category has its copies clustered closely together on one chromosome, the other has its copies distributed widely across multiple chromosomes. For each genome and each of the categories above, we compute N(c), the number of 40-mers that have duplication count c, for each integer c. In each case, we observe a power-law-like decay in N(c) as c increases from 3 to 50 or higher. In particular, we find that N(c) decays much more slowly than would be predicted by evolutionary models where each 40-mer is equally likely to be duplicated. We also analyze an evolutionary model that does reflect the slow decay of N(c).

摘要

我们通过研究足够长以至于不太可能偶然重复的序列,来研究多个基因组中复杂重复DNA的定量特征。对于我们研究的每个基因组,我们确定长度为40的每个序列(即每个“40聚体”)的相同拷贝数,即“重复计数”。如果一个40聚体的重复计数至少为2,我们就说它是“重复的”。我们主要关注“复杂”的40聚体,即那些没有短内部重复的40聚体。我们发现,我们可以将大多数复杂的重复40聚体分为两类:一类其拷贝在一条染色体上紧密聚集在一起,另一类其拷贝广泛分布在多条染色体上。对于每个基因组以及上述每一类,我们针对每个整数c计算具有重复计数c的40聚体的数量N(c)。在每种情况下,我们观察到随着c从3增加到50或更高,N(c)呈现出类似幂律的衰减。特别是,我们发现N(c)的衰减比每个40聚体被复制的可能性相同的进化模型所预测的要慢得多。我们还分析了一个确实反映N(c)缓慢衰减的进化模型。

相似文献

1
Duplication count distributions in DNA sequences.
Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Dec;78(6 Pt 1):061912. doi: 10.1103/PhysRevE.78.061912. Epub 2008 Dec 11.
4
Evolution of beta satellite DNA sequences: evidence for duplication-mediated repeat amplification and spreading.
Mol Biol Evol. 2004 Sep;21(9):1792-9. doi: 10.1093/molbev/msh190. Epub 2004 Jun 16.
6
Study of intrachromosomal duplications among the eukaryote genomes.
Mol Biol Evol. 2001 Dec;18(12):2280-8. doi: 10.1093/oxfordjournals.molbev.a003774.
8
Genomic organization of repetitive DNAs in the cichlid fish Astronotus ocellatus.
Genetica. 2009 Jul;136(3):461-9. doi: 10.1007/s10709-008-9346-7. Epub 2008 Dec 27.
9
Reconstructing histories of complex gene clusters on a phylogeny.
J Comput Biol. 2010 Sep;17(9):1267-79. doi: 10.1089/cmb.2010.0090.

引用本文的文献

2
Genome Sequencing of by a Combination of PacBio RS II and Next-Generation Sequencing Platforms.
Int J Genomics. 2022 Jan 31;2022:4017654. doi: 10.1155/2022/4017654. eCollection 2022.
3
A benchmark study of k-mer counting methods for high-throughput sequencing.
Gigascience. 2018 Dec 1;7(12):giy125. doi: 10.1093/gigascience/giy125.
5
How evolution of genomes is reflected in exact DNA sequence match statistics.
Mol Biol Evol. 2015 Feb;32(2):524-35. doi: 10.1093/molbev/msu313. Epub 2014 Nov 13.
7
Algebraic distribution of segmental duplication lengths in whole-genome sequence self-alignments.
PLoS One. 2011;6(7):e18464. doi: 10.1371/journal.pone.0018464. Epub 2011 Jul 14.
8
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.
Bioinformatics. 2011 Mar 15;27(6):764-70. doi: 10.1093/bioinformatics/btr011. Epub 2011 Jan 7.

本文引用的文献

1
FlyBase: integration and improvements to query tools.
Nucleic Acids Res. 2008 Jan;36(Database issue):D588-93. doi: 10.1093/nar/gkm930. Epub 2007 Dec 26.
2
GenBank.
Nucleic Acids Res. 2008 Jan;36(Database issue):D25-30. doi: 10.1093/nar/gkm929. Epub 2007 Dec 11.
3
Evolution of gene families based on gene duplication, loss, accumulated change, and innovation.
J Comput Biol. 2007 May;14(4):479-95. doi: 10.1089/cmb.2007.A008.
4
Oligonucleotide frequencies in DNA follow a Yule distribution.
Comput Chem. 1996 Mar;20(1):35-8. doi: 10.1016/0097-8485(95)00091-7.
5
Short, local duplications in eukaryotic genomes.
Curr Opin Genet Dev. 2005 Dec;15(6):640-4. doi: 10.1016/j.gde.2005.09.008. Epub 2005 Oct 7.
6
Repbase Update, a database of eukaryotic repetitive elements.
Cytogenet Genome Res. 2005;110(1-4):462-7. doi: 10.1159/000084979.
7
Whole-genome analysis of Alu repeat elements reveals complex evolutionary history.
Genome Res. 2004 Nov;14(11):2245-52. doi: 10.1101/gr.2693004.
8
Distribution of short paired duplications in mammalian genomes.
Proc Natl Acad Sci U S A. 2004 Jul 13;101(28):10349-54. doi: 10.1073/pnas.0403727101. Epub 2004 Jul 6.
9
Mobile elements: drivers of genome evolution.
Science. 2004 Mar 12;303(5664):1626-32. doi: 10.1126/science.1089670.
10
The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics.
PLoS Biol. 2003 Nov;1(2):E45. doi: 10.1371/journal.pbio.0000045. Epub 2003 Nov 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验