Suppr超能文献

使用 K-mer 和 K-flank 模式进行比较分析为哺乳动物基因组中 CpG 岛序列的进化提供了证据。

Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes.

机构信息

Department of Computer Science, School of Informatics and Computing, Indiana University, Bloomington, IN, USA.

出版信息

Nucleic Acids Res. 2013 May;41(9):4783-91. doi: 10.1093/nar/gkt144. Epub 2013 Mar 21.

Abstract

CpG islands are GC-rich regions often located in the 5' end of genes and normally protected from cytosine methylation in mammals. The important role of CpG islands in gene transcription strongly suggests evolutionary conservation in the mammalian genome. However, as CpG dinucleotides are over-represented in CpG islands, comparative CpG island analysis using conventional sequence analysis techniques remains a major challenge in the epigenetics field. In this study, we conducted a comparative analysis of all CpG island sequences in 10 mammalian genomes. As sequence similarity methods and character composition techniques such as information theory are particularly difficult to conduct, we used exact patterns in CpG island sequences and single character discrepancies to identify differences in CpG island sequences. First, by calculating genome distance based on rank correlation tests, we show that k-mer and k-flank patterns around CpG sites can be used to correctly reconstruct the phylogeny of 10 mammalian genomes. Further, we used various machine learning algorithms to demonstrate that CpG islands sequences can be characterized using k-mers. In addition, by testing a human model on the nine different mammalian genomes, we provide the first evidence that k-mer signatures are consistent with evolutionary history.

摘要

CpG 岛是富含 GC 的区域,通常位于基因的 5'端,在哺乳动物中通常受到胞嘧啶甲基化的保护。CpG 岛在基因转录中的重要作用强烈表明哺乳动物基因组中的进化保守性。然而,由于 CpG 二核苷酸在 CpG 岛中过度表达,使用传统的序列分析技术进行比较 CpG 岛分析仍然是表观遗传学领域的主要挑战。在这项研究中,我们对 10 种哺乳动物基因组中的所有 CpG 岛序列进行了比较分析。由于序列相似性方法和信息理论等字符组成技术特别难以进行,我们使用 CpG 岛序列中的精确模式和单个字符差异来识别 CpG 岛序列的差异。首先,通过基于秩相关检验计算基因组距离,我们表明 CpG 位点周围的 k-mer 和 k-侧翼模式可用于正确重建 10 种哺乳动物基因组的系统发育。此外,我们使用各种机器学习算法来证明可以使用 k-mer 来描述 CpG 岛序列。此外,通过在 9 种不同的哺乳动物基因组上测试人类模型,我们首次提供了证据表明 k-mer 特征与进化历史一致。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898d/3643570/6a99254b3430/gkt144f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验