Suppr超能文献

全局重复图谱算法中符号 DNA 序列到频域的直接映射。

Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm.

机构信息

Faculty of Science, University of Zagreb, Bijenička 32 and Croatian Academy of Sciences and Arts, Zrinski trg 11, 10000 Zagreb, Croatia.

出版信息

Nucleic Acids Res. 2013 Jan 7;41(1):e17. doi: 10.1093/nar/gks721. Epub 2012 Sep 12.

Abstract

The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of 'magnifying glass' effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).

摘要

全局重复图谱 (GRM) 算法(www.hazu.hr/grm/software/win/grm2012.exe)的主要特点是能够识别各种长度不限、在人类染色体大小的序列中可以任意远的重复。其功效归因于使用了完整的 K-字符串集合,这使得将符号 DNA 序列直接映射到频域的新方法成为可能,并且可以直接在 GRM 图谱中识别重复。通过这种方式,我们获得了非常快速、高效和高度自动化的重复发现工具。该方法对替换、插入/缺失以及序列模式的各种复杂性具有鲁棒性。我们展示了几个 GRM 使用的案例研究,以说明其功能:α-卫星串联重复和高阶重复(HOR)的识别、Alu 分散重复和 Alu 串联的识别、外显子中周期 3 模式的识别、“放大镜”效应的实现、复杂 HOR 模式的识别、串联过渡分散重复序列的识别以及长片段重复的识别。GRM 算法便于使用,特别是在重复单元较大、高度突变和/或复杂重复以及大基因组序列(染色体和基因组)的全局重复图谱的情况下。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb4c/3592446/448323a3d78a/gks721f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验