• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Genome inhomogeneity is determined mainly by WW and SS dinucleotides.

作者信息

Kozhukhin C G, Pevzner P A

机构信息

Institute of Control Sciences, Moscow, USSR.

出版信息

Comput Appl Biosci. 1991 Jan;7(1):39-49. doi: 10.1093/bioinformatics/7.1.39.

DOI:10.1093/bioinformatics/7.1.39
PMID:2004273
Abstract

According to the hypothesis of the modular structure of DNA, genomes consist of modules of various nature which may differ in statistical characteristics. Statistical analysis helps in revealing the differences in statistical characteristics and predicting the modular structure. In this connection the question about the contribution of each word of length l (l-tuple) to the inhomogeneity of genetic text arises. The notion of stationary (i.e. relatively evenly distributed over a genome) versus non-stationary l-tuples has been introduced previously. In this paper, the dinucleotide distributions for all long sequences from GenBank were analyzed and it was shown that non-stationary dinucleotides are closely associated with polyW and polyS tracts (W denotes 'weak' nucleotides A or T, while S stands for the 'strong' nucleotides G or C). Thus, genome inhomogeneity is shown to be determined mainly by AA, TT, GG, CC, AT, TA, GC and CG dinucleotides. It has been demonstrated that neither 'codon usage' nor the 'isochore model' can account for this phenomenon.

摘要

相似文献

1
Genome inhomogeneity is determined mainly by WW and SS dinucleotides.
Comput Appl Biosci. 1991 Jan;7(1):39-49. doi: 10.1093/bioinformatics/7.1.39.
2
Linguistics of nucleotide sequences. II: Stationary words in genetic texts and the zonal structure of DNA.
J Biomol Struct Dyn. 1989 Apr;6(5):1027-38. doi: 10.1080/07391102.1989.10506529.
3
Estimating the repeat structure and length of DNA sequences using L-tuples.使用L元组估计DNA序列的重复结构和长度。
Genome Res. 2003 Aug;13(8):1916-22. doi: 10.1101/gr.1251803.
4
[Statistical characteristics in primary structures of functional regions of Escherichia coli genome. II. Non-stationary Markov chains].[大肠杆菌基因组功能区一级结构的统计特征。II. 非平稳马尔可夫链]
Mol Biol (Mosk). 1986 Jul-Aug;20(4):1024-33.
5
seq++: analyzing biological sequences with a range of Markov-related models.Seq++:使用一系列与马尔可夫相关的模型分析生物序列。
Bioinformatics. 2005 Jun 1;21(11):2783-4. doi: 10.1093/bioinformatics/bti389. Epub 2005 Mar 17.
6
Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics.大空间上马尔可夫链总变差距离的蒙特卡罗估计及其在系统发育学中的应用。
Stat Appl Genet Mol Biol. 2013 Mar 26;12(1):39-48. doi: 10.1515/sagmb-2012-0023.
7
Super paramagnetic clustering of protein sequences.蛋白质序列的超顺磁聚类
BMC Bioinformatics. 2005 Apr 1;6:82. doi: 10.1186/1471-2105-6-82.
8
A Monte Carlo EM algorithm for de novo motif discovery in biomolecular sequences.一种用于生物分子序列中从头基序发现的蒙特卡罗期望最大化算法。
IEEE/ACM Trans Comput Biol Bioinform. 2009 Jul-Sep;6(3):370-86. doi: 10.1109/TCBB.2008.103.
9
RNA folding kinetics using Monte Carlo and Gillespie algorithms.使用蒙特卡洛算法和 Gillespie 算法的 RNA 折叠动力学
J Math Biol. 2018 Apr;76(5):1195-1227. doi: 10.1007/s00285-017-1169-7. Epub 2017 Aug 5.
10
A derived Markov process for modeling reaction networks.一种用于反应网络建模的派生马尔可夫过程。
Evol Comput. 2003 Winter;11(4):339-62. doi: 10.1162/106365603322519260.

引用本文的文献

1
Classification of COVID-19 and Other Pathogenic Sequences: A Dinucleotide Frequency and Machine Learning Approach.新型冠状病毒肺炎及其他致病序列的分类:一种二核苷酸频率与机器学习方法
IEEE Access. 2020 Oct 15;8:195263-195273. doi: 10.1109/ACCESS.2020.3031387. eCollection 2020.
2
Conservation vs. variation of dinucleotide frequencies across bacterial and archaeal genomes: evolutionary implications.细菌和古菌基因组中二核苷酸频率的保守性与变异性:进化意义。
Front Microbiol. 2013 Sep 6;4:269. doi: 10.3389/fmicb.2013.00269. eCollection 2013.
3
The frequency of two-base tracts in eukaryotic genomes.
真核生物基因组中双碱基序列的频率。
J Mol Evol. 1993 Aug;37(2):123-30. doi: 10.1007/BF02407347.
4
Comparative DNA sequence features in two long Escherichia coli contigs.两个大肠杆菌长重叠群中的比较DNA序列特征
Nucleic Acids Res. 1993 Aug 11;21(16):3875-84. doi: 10.1093/nar/21.16.3875.
5
Symmetry observations in long nucleotide sequences.长核苷酸序列中的对称性观察
Nucleic Acids Res. 1993 Jun 25;21(12):2797-800. doi: 10.1093/nar/21.12.2797.
6
Over- and under-representation of short oligonucleotides in DNA sequences.DNA序列中短寡核苷酸的过度和不足表现
Proc Natl Acad Sci U S A. 1992 Feb 15;89(4):1358-62. doi: 10.1073/pnas.89.4.1358.
7
WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences.WORDUP:一种用于在DNA序列中发现具有统计学意义模式的高效算法。
Nucleic Acids Res. 1992 Jun 11;20(11):2871-5. doi: 10.1093/nar/20.11.2871.
8
Information contents and dinucleotide compositions of plant intron sequences vary with evolutionary origin.植物内含子序列的信息含量和二核苷酸组成随进化起源而变化。
Plant Mol Biol. 1992 Sep;19(6):1057-64. doi: 10.1007/BF00040537.