• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组大小与贝农分布。

Genome sizes and the Benford distribution.

机构信息

Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.

出版信息

PLoS One. 2012;7(5):e36624. doi: 10.1371/journal.pone.0036624. Epub 2012 May 18.

DOI:10.1371/journal.pone.0036624
PMID:22629319
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3356352/
Abstract

BACKGROUND

Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show the presence of some notable general features. These include essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with total genome size for the former, but only logarithmically for the latter.

RESULTS

Simply by assuming that the (protein) coding and non-coding fractions of the genome must have different dynamics and that the non-coding fraction must be particularly versatile and therefore be controlled by a variety of (unspecified) probability distribution functions (pdf's), we are able to predict that the number of ORFs for Eukaryotes follows a Benford distribution and must therefore have a specific logarithmic form. Using the data for the 1000+ genomes available to us in early 2010, we find that the Benford distribution provides excellent fits to the data over several orders of magnitude.

CONCLUSIONS

In its linear regime the Benford distribution produces excellent fits to the Prokaryote data, while the full non-linear form of the distribution similarly provides an excellent fit to the Eukaryote data. Furthermore, in their region of overlap the salient features are statistically congruent. This allows us to interpret the difference between Prokaryotes and Eukaryotes as the manifestation of the increased demand in the biological functions required for the larger Eukaryotes, to estimate some minimal genome sizes, and to predict a maximal Prokaryote genome size on the order of 8-12 megabasepairs. These results naturally allow a mathematical interpretation in terms of maximal entropy and, therefore, most efficient information transmission.

摘要

背景

来自生命的三个领域的基因组编码的开放阅读框(ORF)数量的数据显示出一些显著的一般特征。这些特征包括原核生物和真核生物之间的基本差异,前者的 ORF 数量随总基因组大小呈线性增长,而后者则仅呈对数增长。

结果

仅通过假设基因组的(蛋白质)编码和非编码部分必须具有不同的动态特性,并且非编码部分必须特别灵活,因此受到各种(未指定)概率分布函数(pdf)的控制,我们能够预测真核生物的 ORF 数量遵循贝努利分布,因此必须具有特定的对数形式。使用我们在 2010 年初获得的 1000 多个基因组的数据,我们发现贝努利分布在几个数量级上都能很好地拟合数据。

结论

在其线性范围内,贝努利分布对原核生物数据产生了极好的拟合,而分布的完整非线性形式同样对真核生物数据提供了极好的拟合。此外,在它们的重叠区域,显著特征在统计学上是一致的。这使我们能够将原核生物和真核生物之间的差异解释为生物功能需求增加的表现,这些生物功能是真核生物所必需的,以估计一些最小的基因组大小,并预测最大的原核生物基因组大小约为 8-12 兆碱基对。这些结果自然允许从最大熵的角度进行数学解释,因此是最有效的信息传输。

相似文献

1
Genome sizes and the Benford distribution.基因组大小与贝农分布。
PLoS One. 2012;7(5):e36624. doi: 10.1371/journal.pone.0036624. Epub 2012 May 18.
2
A new database (GCD) on genome composition for eukaryote and prokaryote genome sequences and their initial analyses.一个新的数据库(GCD),包含真核生物和原核生物基因组序列的基因组组成信息及其初步分析。
Genome Biol Evol. 2012;4(4):501-12. doi: 10.1093/gbe/evs026. Epub 2012 Mar 14.
3
Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes.微生物基因分布图谱:原核生物中开放阅读框分布的系统发育和功能模式
Microb Comp Genomics. 1998;3(4):199-217. doi: 10.1089/omi.1.1998.3.199.
4
Universal pattern and diverse strengths of successive synonymous codon bias in three domains of life, particularly among prokaryotic genomes.在生命的三个领域中,连续同义密码子偏性具有普遍的模式和多样的优势,特别是在原核基因组中。
DNA Res. 2012 Dec;19(6):477-85. doi: 10.1093/dnares/dss027. Epub 2012 Nov 6.
5
LUCA as well as the ancestors of archaea, bacteria and eukaryotes were progenotes: Inference from the distribution and diversity of the reading mechanism of the AUA and AUG codons in the domains of life.原核生物、古菌和真核生物的祖先都是前原核生物:根据生命领域中 AUA 和 AUG 密码子的阅读机制的分布和多样性推断。
Biosystems. 2020 Dec;198:104239. doi: 10.1016/j.biosystems.2020.104239. Epub 2020 Sep 9.
6
Microbial genescapes: a prokaryotic view of the yeast genome.微生物基因景观:从原核生物视角看酵母基因组
Microb Comp Genomics. 1998;3(4):219-35. doi: 10.1089/omi.1.1998.3.219.
7
A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes.一种多功能的回文两性重复编码序列,广泛分布于各种细菌和真核微生物中。
BMC Genomics. 2010 Jul 13;11:430. doi: 10.1186/1471-2164-11-430.
8
Assessment of assumptions underlying models of prokaryotic pangenome evolution.对原核生物泛基因组进化模型所依据假设的评估。
BMC Biol. 2021 Feb 10;19(1):27. doi: 10.1186/s12915-021-00960-2.
9
Evolution of the first genetic cells and the universal genetic code: a hypothesis based on macromolecular coevolution of RNA and proteins.首个遗传细胞与通用遗传密码的演化:基于RNA与蛋白质大分子共同演化的假说
J Theor Biol. 2014 Sep 21;357:220-44. doi: 10.1016/j.jtbi.2014.06.003. Epub 2014 Jun 12.
10
An extended genetic scale of reading frame coding.阅读框编码的扩展遗传尺度。
J Theor Biol. 2015 Jan 21;365:164-74. doi: 10.1016/j.jtbi.2014.09.040. Epub 2014 Oct 13.

引用本文的文献

1
Newcomb-Benford number law and ecological processes.纽科姆 - 本福特定律与生态过程。
PLoS One. 2025 Mar 28;20(3):e0310205. doi: 10.1371/journal.pone.0310205. eCollection 2025.
2
Investigating and preventing scientific misconduct using Benford's Law.运用本福特定律调查和预防科学不端行为。
Res Integr Peer Rev. 2023 Apr 11;8(1):1. doi: 10.1186/s41073-022-00126-w.
3
Prediction of Bacterial sRNAs Using Sequence-Derived Features and Machine Learning.利用序列衍生特征和机器学习预测细菌小RNA
Bioinform Biol Insights. 2022 Aug 18;16:11779322221118335. doi: 10.1177/11779322221118335. eCollection 2022.
4
Characterizing Human Cell Types and Tissue Origin Using the Benford Law.使用贝叶斯定律刻画人类细胞类型和组织起源。
Cells. 2019 Aug 29;8(9):1004. doi: 10.3390/cells8091004.
5
The thermodynamic efficiency of computations made in cells across the range of life.细胞内计算的热力学效率。
Philos Trans A Math Phys Eng Sci. 2017 Dec 28;375(2109). doi: 10.1098/rsta.2016.0343.
6
Elucidating tissue specific genes using the Benford distribution.利用本福特定律阐明组织特异性基因。
BMC Genomics. 2016 Aug 9;17:595. doi: 10.1186/s12864-016-2921-x.
7
Assessing Conformance with Benford's Law: Goodness-Of-Fit Tests and Simultaneous Confidence Intervals.评估与本福特定律的一致性:拟合优度检验和同时置信区间
PLoS One. 2016 Mar 28;11(3):e0151235. doi: 10.1371/journal.pone.0151235. eCollection 2016.
8
Using Skewness and the First-Digit Phenomenon to Identify Dynamical Transitions in Cardiac Models.利用偏度和首位数字现象识别心脏模型中的动态转变。
Front Physiol. 2016 Jan 11;6:390. doi: 10.3389/fphys.2015.00390. eCollection 2015.
9
What's in a genome? The C-value enigma and the evolution of eukaryotic genome content.基因组里有什么?C值之谜与真核生物基因组内容的进化。
Philos Trans R Soc Lond B Biol Sci. 2015 Sep 26;370(1678):20140331. doi: 10.1098/rstb.2014.0331.
10
Archaea: the first domain of diversified life.古菌:多样化生命的首个领域。
Archaea. 2014 Jun 2;2014:590214. doi: 10.1155/2014/590214. eCollection 2014.

本文引用的文献

1
Reductive evolution of proteomes and protein structures.蛋白质组和蛋白质结构的简化进化。
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11954-8. doi: 10.1073/pnas.1017361108. Epub 2011 Jul 5.
2
Non-adaptive origins of interactome complexity.互作网络复杂性的非适应性起源。
Nature. 2011 May 18;474(7352):502-5. doi: 10.1038/nature09992.
3
The random nature of genome architecture: predicting open reading frame distributions.基因组结构的随机性:预测开放阅读框分布
PLoS One. 2009 Jul 30;4(7):e6456. doi: 10.1371/journal.pone.0006456.
4
The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world.蛋白质组中结构域组织的进化机制与蛋白质世界中模块化的兴起。
Structure. 2009 Jan 14;17(1):66-78. doi: 10.1016/j.str.2008.11.008.
5
How much non-coding DNA do eukaryotes require?真核生物需要多少非编码DNA?
J Theor Biol. 2008 Jun 21;252(4):587-92. doi: 10.1016/j.jtbi.2008.02.005. Epub 2008 Feb 14.
6
The relationship between non-protein-coding DNA and eukaryotic complexity.非蛋白质编码DNA与真核生物复杂性之间的关系。
Bioessays. 2007 Mar;29(3):288-99. doi: 10.1002/bies.20544.
7
Essential genes of a minimal bacterium.最小细菌的必需基因。
Proc Natl Acad Sci U S A. 2006 Jan 10;103(2):425-30. doi: 10.1073/pnas.0510013103. Epub 2006 Jan 3.
8
Determination of the core of a minimal bacterial gene set.最小细菌基因集核心的确定。
Microbiol Mol Biol Rev. 2004 Sep;68(3):518-37, table of contents. doi: 10.1128/MMBR.68.3.518-537.2004.
9
RNA regulation: a new genetics?RNA调控:一种新的遗传学?
Nat Rev Genet. 2004 Apr;5(4):316-23. doi: 10.1038/nrg1321.
10
The origins of genome complexity.基因组复杂性的起源。
Science. 2003 Nov 21;302(5649):1401-4. doi: 10.1126/science.1089370.