• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

原核生物基因组中间隔基序的无监督统计发现。

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.

作者信息

Tong Hao, Schliekelman Paul, Mrázek Jan

机构信息

Department of Statistics, University of Georgia, Athens, GA, 30602, USA.

Department of Microbiology and Institute of Bioinformatics, University of Georgia, Athens, GA, 30602, USA.

出版信息

BMC Genomics. 2017 Jan 5;18(1):27. doi: 10.1186/s12864-016-3400-0.

DOI:10.1186/s12864-016-3400-0
PMID:28056763
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5217627/
Abstract

BACKGROUND

DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences.

RESULTS

We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed.

CONCLUSIONS

We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude that our approach to detection of significant motif pairs can complement existing motif-finding techniques in discovery of novel functional sequence motifs in complete genomes.

摘要

背景

DNA序列包含重复基序,这些基序在生物体生理过程中具有多种功能。已经开发了许多方法来发现此类序列基序,主要侧重于检测调控基序,特别是转录因子结合位点。大多数基序查找方法应用概率模型来检测基序,其特征是在所分析的序列中基序的拷贝数异常高。

结果

我们提出了一种新方法,用于检测由可变核苷酸序列但长度保守的间隔区隔开的基序对。与现有的基序发现方法不同,基序本身不需要以异常高的频率出现,而只需要表现出在彼此特定距离处出现的显著偏好。在该方法的当前实现中,基序由五聚体表示,并且评估所有五聚体对在特定距离上的统计显著偏好。该算法的一个重要步骤是消除间隔区隔开两个基序的基序对,其中间隔区表现出高度的序列相似性;这样的基序对可能来自包括基序和间隔区的整个片段的重复,而不是由于表明基序对功能重要性的选择性约束。该方法用于扫描569个完整的原核生物基因组以寻找新的序列基序。检测到的一些基序以前是已知的,但在搜索中发现的其他基序似乎是新的。对选定的基序对进行了进一步研究,在某些情况下还提出了它们可能的生物学功能。

结论

我们提出了一种新的基序查找技术,适用于扫描完整基因组以寻找序列基序。对569个基因组的分析结果表明,该方法检测到了预期会发现的先前已知的基序以及传统基序查找方法不太可能发现的新基序。我们得出结论,我们检测显著基序对的方法可以在发现完整基因组中的新功能序列基序方面补充现有的基序查找技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/b01e7a18a5d6/12864_2016_3400_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/3820b39cff28/12864_2016_3400_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/ac77490ff319/12864_2016_3400_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/a0667e2506bb/12864_2016_3400_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/512d564da169/12864_2016_3400_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/b01e7a18a5d6/12864_2016_3400_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/3820b39cff28/12864_2016_3400_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/ac77490ff319/12864_2016_3400_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/a0667e2506bb/12864_2016_3400_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/512d564da169/12864_2016_3400_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3480/5217627/b01e7a18a5d6/12864_2016_3400_Fig5_HTML.jpg

相似文献

1
Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.原核生物基因组中间隔基序的无监督统计发现。
BMC Genomics. 2017 Jan 5;18(1):27. doi: 10.1186/s12864-016-3400-0.
2
An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.一种用于原核生物基因组中顺式调控基序识别的综合且适用的系统发育足迹分析框架。
BMC Genomics. 2016 Aug 9;17:578. doi: 10.1186/s12864-016-2982-x.
3
Finding sequence motifs in prokaryotic genomes--a brief practical guide for a microbiologist.在原核生物基因组中寻找序列基序——给微生物学家的简要实用指南。
Brief Bioinform. 2009 Sep;10(5):525-36. doi: 10.1093/bib/bbp032. Epub 2009 Jun 24.
4
Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.原核生物基因组中异常序列检测方法的比较
Biochemistry (Mosc). 2018 Feb;83(2):129-139. doi: 10.1134/S0006297918020050.
5
Simultaneously learning DNA motif along with its position and sequence rank preferences through expectation maximization algorithm.通过期望最大化算法同时学习DNA基序及其位置和序列排名偏好。
J Comput Biol. 2013 Mar;20(3):237-48. doi: 10.1089/cmb.2012.0233.
6
Survey of clustered regularly interspaced short palindromic repeats and their associated Cas proteins (CRISPR/Cas) systems in multiple sequenced strains of Klebsiella pneumoniae.肺炎克雷伯菌多个测序菌株中规律成簇间隔短回文重复序列及其相关Cas蛋白(CRISPR/Cas)系统的调查
BMC Res Notes. 2015 Aug 4;8:332. doi: 10.1186/s13104-015-1285-7.
7
A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs.基于蒙特卡罗的框架增强了调控序列基序的发现和解释。
BMC Bioinformatics. 2012 Nov 27;13:317. doi: 10.1186/1471-2105-13-317.
8
Ab initio identification of putative human transcription factor binding sites by comparative genomics.通过比较基因组学从头鉴定假定的人类转录因子结合位点
BMC Bioinformatics. 2005 May 2;6:110. doi: 10.1186/1471-2105-6-110.
9
Parametric bootstrapping for biological sequence motifs.生物序列基序的参数自举法
BMC Bioinformatics. 2016 Oct 6;17(1):406. doi: 10.1186/s12859-016-1246-8.
10
A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length.一种用于识别具有对称结构、间隔的DNA基序并改进信号长度估计的吉布斯采样器。
Bioinformatics. 2005 May 15;21(10):2240-5. doi: 10.1093/bioinformatics/bti336. Epub 2005 Feb 22.

引用本文的文献

1
Control of RNA viruses in mosquito cells through the acquisition of vDNA and endogenous viral elements.通过获取 vDNA 和内源性病毒元件来控制蚊子细胞中的 RNA 病毒。
Elife. 2019 Oct 17;8:e41244. doi: 10.7554/eLife.41244.

本文引用的文献

1
RNA-Seq analyses reveal the order of tRNA processing events and the maturation of C/D box and CRISPR RNAs in the hyperthermophile Methanopyrus kandleri.RNA-Seq 分析揭示了嗜热古菌 Methanopyrus kandleri 中 tRNA 加工事件的顺序以及 C/D 盒和 CRISPR RNA 的成熟过程。
Nucleic Acids Res. 2013 Jul;41(12):6250-8. doi: 10.1093/nar/gkt317. Epub 2013 Apr 25.
2
Shine-Dalgarno sequence of bacteriophage T4: GAGG prevails in early genes.T4 噬菌体 Shine-Dalgarno 序列:早期基因中 GAGG 占优势。
Mol Biol Rep. 2012 Jan;39(1):33-9. doi: 10.1007/s11033-011-0707-4. Epub 2011 Apr 30.
3
The PE-PPE domain in mycobacterium reveals a serine α/β hydrolase fold and function: an in-silico analysis.
分枝杆菌的 PE-PPE 结构域揭示了丝氨酸 α/β 水解酶折叠和功能:计算机分析。
PLoS One. 2011 Feb 10;6(2):e16745. doi: 10.1371/journal.pone.0016745.
4
CRISPR interference: RNA-directed adaptive immunity in bacteria and archaea.CRISPR 干扰:细菌和古菌中的 RNA 导向适应性免疫。
Nat Rev Genet. 2010 Mar;11(3):181-90. doi: 10.1038/nrg2749.
5
CRISPR/Cas, the immune system of bacteria and archaea.CRISPR/Cas,细菌和古菌的免疫系统。
Science. 2010 Jan 8;327(5962):167-70. doi: 10.1126/science.1179555.
6
MEME SUITE: tools for motif discovery and searching.MEME套件:用于基序发现和搜索的工具。
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W202-8. doi: 10.1093/nar/gkp335. Epub 2009 May 20.
7
A reexamination of information theory-based methods for DNA-binding site identification.基于信息论的DNA结合位点识别方法的重新审视。
BMC Bioinformatics. 2009 Feb 11;10:57. doi: 10.1186/1471-2105-10-57.
8
Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites.使用结合位置信息的贝叶斯模型寻找序列基序:在转录因子结合位点上的应用
BMC Bioinformatics. 2008 Jun 4;9:262. doi: 10.1186/1471-2105-9-262.
9
AIMIE: a web-based environment for detection and interpretation of significant sequence motifs in prokaryotic genomes.AIMIE:一个基于网络的用于检测和解释原核生物基因组中重要序列基序的环境。
Bioinformatics. 2008 Apr 15;24(8):1041-8. doi: 10.1093/bioinformatics/btn077. Epub 2008 Feb 26.
10
HeliCis: a DNA motif discovery tool for colocalized motif pairs with periodic spacing.HeliCis:一种用于发现具有周期性间隔的共定位基序对的DNA基序发现工具。
BMC Bioinformatics. 2007 Oct 28;8:418. doi: 10.1186/1471-2105-8-418.