Suppr超能文献

完整基因组中基因家族大小的频率分布。

The frequency distribution of gene family sizes in complete genomes.

作者信息

Huynen M A, van Nimwegen E

机构信息

Santa Fe Institute, New Mexico, USA.

出版信息

Mol Biol Evol. 1998 May;15(5):583-9. doi: 10.1093/oxfordjournals.molbev.a025959.

Abstract

We compare the frequency distribution of gene family sizes in the complete genomes of six bacteria (Escherichia coli, Haemophilus influenzae, Helicobacter pylori, Mycoplasma genitalium, Mycoplasma pneumoniae, and Synechocystis sp. PCC6803), two Archaea (Methanococcus jannaschii and Methanobacterium thermoautotrophicum), one eukaryote (Saccharomyces cerevisiae), the vaccinia virus, and the bacteriophage T4. The sizes of the gene families versus their frequencies show power-law distributions that tend to become flatter (have a larger exponent) as the number of genes in the genome increases. Power-law distributions generally occur as the limit distribution of a multiplicative stochastic process with a boundary constraint. We discuss various models that can account for a multiplicative process determining the sizes of gene families in the genome. In particular, we argue that, in order to explain the observed distributions, gene families have to behave in a coherent fashion within the genome; i.e., the probabilities of duplications of genes within a gene family are not independent of each other. Likewise, the probabilities of deletions of genes within a gene family are not independent of each other.

摘要

我们比较了六种细菌(大肠杆菌、流感嗜血杆菌、幽门螺杆菌、生殖道支原体、肺炎支原体和聚球藻属PCC6803)、两种古生菌(詹氏甲烷球菌和嗜热自养甲烷杆菌)、一种真核生物(酿酒酵母)、痘苗病毒和噬菌体T4的全基因组中基因家族大小的频率分布。基因家族大小与其频率呈现幂律分布,随着基因组中基因数量的增加,这种分布往往会变得更平缓(指数更大)。幂律分布通常作为具有边界约束的乘法随机过程的极限分布出现。我们讨论了各种能够解释决定基因组中基因家族大小的乘法过程的模型。特别是,我们认为,为了解释观察到的分布,基因家族在基因组内必须以连贯的方式表现;也就是说,基因家族内基因复制的概率并非相互独立。同样,基因家族内基因缺失的概率也并非相互独立。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验