• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于DNA序列统计分析的不同马尔可夫链模型中的特殊基序。

Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

作者信息

Schbath S, Prum B, de Turckheim E

机构信息

INRA, Département de Biométrie et Intelligence Artificielle, Jouy-en-Josas, France.

出版信息

J Comput Biol. 1995 Fall;2(3):417-37. doi: 10.1089/cmb.1995.2.417.

DOI:10.1089/cmb.1995.2.417
PMID:8521272
Abstract

Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are analyzed in different models, showing that many overabundant words are one-letter mutations of avoided palindromes.

摘要

识别异常基序常用于从长DNA序列中提取信息。该方法的两个难点在于定义单词预期频率的模型选择,以及单词W出现次数与其估计值之间差异T(W)的方差近似值。我们在此考虑不同的马尔可夫链模型,其转移概率可为平稳或周期性的。我们通过给定定义模型的寡核苷酸计数情况下W出现次数的条件方差来估计差异T(W)的方差。两个应用展示了如何使用与计数相关的渐近标准正态统计量,根据其异常单词来描述给定序列。对大肠杆菌和枯草芽孢杆菌的序列进行了比较,分析了它们的异常三核苷酸和四核苷酸。对于这两种细菌,异常三字主要出现在编码框中。在不同模型中分析了大肠杆菌回文计数,结果表明许多过量的单词是避免出现的回文的单字母突变。

相似文献

1
Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.用于DNA序列统计分析的不同马尔可夫链模型中的特殊基序。
J Comput Biol. 1995 Fall;2(3):417-37. doi: 10.1089/cmb.1995.2.417.
2
An overview on the distribution of word counts in Markov chains.马尔可夫链中词频分布概述。
J Comput Biol. 2000 Feb-Apr;7(1-2):193-201. doi: 10.1089/10665270050081469.
3
Probabilistic and statistical properties of words: an overview.词汇的概率与统计特性:综述
J Comput Biol. 2000 Feb-Apr;7(1-2):1-46. doi: 10.1089/10665270050081360.
4
Bacterial genomes lacking long-range correlations may not be modeled by low-order Markov chains: the role of mixing statistics and frame shift of neighboring genes.缺乏长程相关性的细菌基因组可能无法用低阶马尔可夫链建模:混合统计和相邻基因移码的作用。
Comput Biol Chem. 2014 Dec;53 Pt A:15-25. doi: 10.1016/j.compbiolchem.2014.08.005. Epub 2014 Aug 30.
5
[Statistical characteristics of primary structures of the functional regions of the Escherichia coli genome. III. Computer recognition of coding regions].[大肠杆菌基因组功能区一级结构的统计特征。III. 编码区的计算机识别]
Mol Biol (Mosk). 1986 Sep-Oct;20(5):1390-8.
6
First and second moment of counts of words in random texts generated by Markov chains.
Comput Appl Biosci. 1992 Oct;8(5):433-41. doi: 10.1093/bioinformatics/8.5.433.
7
[Statistical characteristics in primary structures of functional regions of Escherichia coli genome. II. Non-stationary Markov chains].[大肠杆菌基因组功能区一级结构的统计特征。II. 非平稳马尔可夫链]
Mol Biol (Mosk). 1986 Jul-Aug;20(4):1024-33.
8
Drifting Markov models with polynomial drift and applications to DNA sequences.具有多项式漂移的漂移马尔可夫模型及其在DNA序列中的应用。
Stat Appl Genet Mol Biol. 2008;7(1):Article6. doi: 10.2202/1544-6115.1326. Epub 2008 Feb 21.
9
Distribution of potential type II restriction sites (palindromes) in prokaryotes.原核生物中潜在II型限制酶切位点(回文序列)的分布。
Biochem Biophys Res Commun. 2003 Oct 17;310(2):280-5. doi: 10.1016/j.bbrc.2003.09.014.
10
Determination of bias in the relative abundance of oligonucleotides in DNA sequences.DNA序列中寡核苷酸相对丰度偏差的测定。
J Comput Biol. 2001;8(2):151-75. doi: 10.1089/106652701300312922.

引用本文的文献

1
Evolution of Chi motifs in Proteobacteria.变形菌中 Chi 基序的进化。
G3 (Bethesda). 2021 Jan 18;11(1). doi: 10.1093/g3journal/jkaa054.
2
Oral microbiome and pancreatic cancer.口腔微生物群与胰腺癌
World J Gastroenterol. 2020 Dec 28;26(48):7679-7692. doi: 10.3748/wjg.v26.i48.7679.
3
Evolutionary selection against short nucleotide sequences in viruses and their related hosts.病毒及其相关宿主中对短核苷酸序列的进化选择。
DNA Res. 2020 Apr 1;27(2). doi: 10.1093/dnares/dsaa008.
4
A high-resolution genomic composition-based method with the ability to distinguish similar bacterial organisms.一种基于高分辨率基因组组成的方法,具有区分相似细菌的能力。
BMC Genomics. 2019 Oct 21;20(1):754. doi: 10.1186/s12864-019-6119-x.
5
Single-molecule sequencing detection of N6-methyladenine in microbial reference materials.单细胞测序检测微生物参考材料中的 N6-甲基腺嘌呤。
Nat Commun. 2019 Feb 4;10(1):579. doi: 10.1038/s41467-019-08289-9.
6
Characterization of Uncultured Genome Fragment from Soil Metagenomic Library Exposed Rare Mismatch of Internal Tetranucleotide Frequency.土壤宏基因组文库中未培养基因组片段的表征揭示了内部四核苷酸频率的罕见错配。
Front Microbiol. 2016 Dec 22;7:2081. doi: 10.3389/fmicb.2016.02081. eCollection 2016.
7
Lifespan of restriction-modification systems critically affects avoidance of their recognition sites in host genomes.限制修饰系统的寿命严重影响其在宿主基因组中对识别位点的规避。
BMC Genomics. 2015 Dec 21;16:1084. doi: 10.1186/s12864-015-2288-4.
8
Palindromes in SARS and Other Coronaviruses.严重急性呼吸综合征及其他冠状病毒中的回文序列
INFORMS J Comput. 2004 Fall;16(4):331-340. doi: 10.1287/ijoc.1040.0087.
9
Sequence analysis by iterated maps, a review.通过迭代映射进行序列分析,综述。
Brief Bioinform. 2014 May;15(3):369-75. doi: 10.1093/bib/bbt072. Epub 2013 Oct 25.
10
Manipulating or superseding host recombination functions: a dilemma that shapes phage evolvability.操纵或超越宿主重组功能:塑造噬菌体可进化性的困境。
PLoS Genet. 2013;9(9):e1003825. doi: 10.1371/journal.pgen.1003825. Epub 2013 Sep 26.