词簇：检测DNA词和基因组元件的簇

WordCluster: detecting clusters of DNA words and genomic elements.

作者信息

Hackenberg Michael, Carpena Pedro, Bernaola-Galván Pedro, Barturen Guillermo, Alganza Angel M, Oliver José L

机构信息

Dpto, de Genética, Facultad de Ciencias, Universidad de Granada, Campus de Fuentenueva s/n, 18071-Granada & Lab, de Bioinformática, Centro de Investigación Biomédica, PTS, Avda, del Conocimiento s/n, 18100-Granada, Spain.

出版信息

Algorithms Mol Biol. 2011 Jan 24;6:2. doi: 10.1186/1748-7188-6-2.

DOI:10.1186/1748-7188-6-2

PMID:21261981

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3037320/

Abstract

BACKGROUND

Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds.

RESULTS

We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome.

CONCLUSIONS

WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.

摘要

背景

已知许多k-mer（或DNA单词）和基因组元件在基因组中呈空间聚集状态。已明确的例子包括基因、转录因子结合位点（TFBS）、CpG二核苷酸、微小RNA基因和超保守非编码区。目前，尚无一种算法能够以统计学上可理解的方式找到这些聚集区域。聚集区域的检测通常依赖于密度和滑动窗口方法或任意选择的距离阈值。

结果

我们在此介绍一种基于连续拷贝之间的距离和指定的统计显著性来检测DNA单词（k-mer）或任何其他基因组元件聚集区域的算法。我们将该方法实现为一个连接到MySQL后端的网络服务器，该服务器还能确定与基因注释的共定位情况。我们通过检测CAG/CTG（在未分化细胞中可甲基化的胞嘧啶环境）的聚集区域，证明了这种方法的有效性，结果表明聚集区域内外的甲基化程度差异很大。作为另一个例子，我们使用WordCluster在人类基因组中搜索嗅觉受体（OR）基因的具有统计学显著性的聚集区域。

结论

WordCluster似乎能够预测DNA单词（k-mer）和基因组实体的具有生物学意义的聚集区域。该方法在网络服务器上的实现可通过http://bioinfo2.ugr.es/wordCluster/wordCluster.php获取，其中还包括与基因区域共定位检测或重叠基因功能分析的注释富集工具等其他功能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91ee/3037320/7b9c10064f05/1748-7188-6-2-1.jpg

相似文献

WordCluster: detecting clusters of DNA words and genomic elements.词簇：检测DNA词和基因组元件的簇

Algorithms Mol Biol. 2011 Jan 24;6:2. doi: 10.1186/1748-7188-6-2.

Clustering of DNA words and biological function: a proof of principle.DNA 单词聚类与生物功能：原理验证。

J Theor Biol. 2012 Mar 21;297:127-36. doi: 10.1016/j.jtbi.2011.12.024. Epub 2011 Dec 30.

Prediction of CpG-island function: CpG clustering vs. sliding-window methods.CpG 岛功能预测：CpG 聚类与滑动窗口方法。

BMC Genomics. 2010 May 26;11:327. doi: 10.1186/1471-2164-11-327.

Prediction of CpG Islands as an Intrinsic Clustering Property Found in Many Eukaryotic DNA Sequences and Its Relation to DNA Methylation.将CpG岛预测为许多真核生物DNA序列中固有的聚类特性及其与DNA甲基化的关系。

Methods Mol Biol. 2018;1766:31-47. doi: 10.1007/978-1-4939-7768-0_3.

CpGcluster: a distance-based algorithm for CpG-island detection.CpG簇：一种基于距离的CpG岛检测算法。

BMC Bioinformatics. 2006 Oct 12;7:446. doi: 10.1186/1471-2105-7-446.

DNA clustering and genome complexity.DNA聚类与基因组复杂性。

Comput Biol Chem. 2014 Dec;53 Pt A:71-8. doi: 10.1016/j.compbiolchem.2014.08.011. Epub 2014 Aug 23.

NGSmethDB: a database for next-generation sequencing single-cytosine-resolution DNA methylation data.NGSmethDB：一个用于下一代测序单胞嘧啶分辨率DNA甲基化数据的数据库。

Nucleic Acids Res. 2011 Jan;39(Database issue):D75-9. doi: 10.1093/nar/gkq942. Epub 2010 Oct 21.

LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes.LCGbase：基于谱系的共调控基因综合数据库。

Evol Bioinform Online. 2012;8:39-46. doi: 10.4137/EBO.S8540. Epub 2011 Dec 13.

TTS mapping: integrative WEB tool for analysis of triplex formation target DNA sequences, G-quadruplets and non-protein coding regulatory DNA elements in the human genome.TTS 映射：用于分析人类基因组中三聚体形成靶 DNA 序列、G-四联体和非蛋白编码调控 DNA 元件的综合 WEB 工具。

BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S9. doi: 10.1186/1471-2164-10-S3-S9.

GREAM: A Web Server to Short-List Potentially Important Genomic Repeat Elements Based on Over-/Under-Representation in Specific Chromosomal Locations, Such as the Gene Neighborhoods, within or across 17 Mammalian Species.GREAM：一个基于17种哺乳动物物种内或跨物种的特定染色体位置（如基因邻域）的过表达/低表达情况，筛选潜在重要基因组重复元件的网络服务器。

PLoS One. 2015 Jul 24;10(7):e0133647. doi: 10.1371/journal.pone.0133647. eCollection 2015.

引用本文的文献

PCGIMA: developing the web server for human position-defined CpG islands methylation analysis.PCGIMA：开发用于人类位置定义的CpG岛甲基化分析的网络服务器。

Front Genet. 2024 Mar 13;15:1367731. doi: 10.3389/fgene.2024.1367731. eCollection 2024.

Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes.迁移学习可在样本量有限的情况下实现准确的RNA结合蛋白靶位点预测。

Biology (Basel). 2023 Sep 25;12(10):1276. doi: 10.3390/biology12101276.

A review of computational algorithms for CpG islands detection.CpG 岛检测的计算算法综述。

J Biosci. 2019 Dec;44(6).

Genome-Wide Profiling of DNA Methyltransferases in Mammalian Cells.哺乳动物细胞中DNA甲基转移酶的全基因组分析

Methods Mol Biol. 2018;1766:157-174. doi: 10.1007/978-1-4939-7768-0_9.

Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels.区分功能 DNA 词；一种测量聚类水平的方法。

Sci Rep. 2017 Jan 27;7:41543. doi: 10.1038/srep41543.

CpGislandEVO: a database and genome browser for comparative evolutionary genomics of CpG islands.CpG 岛进化数据库和基因组浏览器：用于 CpG 岛比较进化基因组学的数据库和基因组浏览器。

Biomed Res Int. 2013;2013:709042. doi: 10.1155/2013/709042. Epub 2013 Sep 25.

本文引用的文献

Prediction of CpG-island function: CpG clustering vs. sliding-window methods.CpG 岛功能预测：CpG 聚类与滑动窗口方法。

BMC Genomics. 2010 May 26;11:327. doi: 10.1186/1471-2164-11-327.

BEDTools: a flexible suite of utilities for comparing genomic features.BEDTools：一套灵活的基因组特征比较工具套件。

Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28.

Algorithms and methods for correlating experimental results with annotation databases.

Methods Mol Biol. 2010;593:315-40. doi: 10.1007/978-1-60327-194-3_15.

Human DNA methylomes at base resolution show widespread epigenomic differences.碱基分辨率下的人类DNA甲基化组显示出广泛的表观基因组差异。

Nature. 2009 Nov 19;462(7271):315-22. doi: 10.1038/nature08514. Epub 2009 Oct 14.

Level statistics of words: finding keywords in literary texts and symbolic sequences.词汇的层级统计：在文学文本和符号序列中寻找关键词

Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Mar;79(3 Pt 2):035102. doi: 10.1103/PhysRevE.79.035102. Epub 2009 Mar 10.

Ensembl 2009.Ensembl 2009.

Nucleic Acids Res. 2009 Jan;37(Database issue):D690-7. doi: 10.1093/nar/gkn828. Epub 2008 Nov 25.

Annotation-Modules: a tool for finding significant combinations of multisource annotations for gene lists.注释模块：一种用于为基因列表寻找多源注释的显著组合的工具。

Bioinformatics. 2008 Jun 1;24(11):1386-93. doi: 10.1093/bioinformatics/btn178. Epub 2008 Apr 23.

The UCSC Genome Browser Database: 2008 update.加州大学圣克鲁兹分校基因组浏览器数据库：2008年更新版。

Nucleic Acids Res. 2008 Jan;36(Database issue):D773-9. doi: 10.1093/nar/gkm966. Epub 2007 Dec 17.

NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.美国国立生物技术信息中心参考序列（RefSeq）：一个经过整理的基因组、转录本和蛋白质的非冗余序列数据库。

Nucleic Acids Res. 2007 Jan;35(Database issue):D61-5. doi: 10.1093/nar/gkl842. Epub 2006 Nov 27.

CpGcluster: a distance-based algorithm for CpG-island detection.CpG簇：一种基于距离的CpG岛检测算法。

BMC Bioinformatics. 2006 Oct 12;7:446. doi: 10.1186/1471-2105-7-446.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

词簇：检测DNA词和基因组元件的簇

WordCluster: detecting clusters of DNA words and genomic elements.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献