一种新的重复序列屏蔽方法可实现同源序列的特异性检测。

A new repeat-masking method enables specific detection of homologous sequences.

机构信息

Computational Biology Research Center, Institute for Advanced Industrial Science and Technology, Sequence Analysis Team, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.

出版信息

Nucleic Acids Res. 2011 Mar;39(4):e23. doi: 10.1093/nar/gkq1212. Epub 2010 Nov 24.

DOI:10.1093/nar/gkq1212

PMID:21109538

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3045581/

Abstract

Biological sequences are often analyzed by detecting homologous regions between them. Homology search is confounded by simple repeats, which give rise to strong similarities that are not homologies. Standard repeat-masking methods fail to eliminate this problem, and they are especially ill-suited to AT-rich DNA such as malaria and slime-mould genomes. We present a new repeat-masking method, TANTAN, which is motivated by the mechanisms that create simple repeats. This method thoroughly eliminates spurious homology predictions for DNA-DNA, protein-protein and DNA-protein comparisons. Moreover, it enables accurate homology search for non-coding DNA with extreme A + T composition.

摘要

生物序列通常通过检测它们之间的同源区域来进行分析。简单重复序列会干扰同源性搜索，因为它们会产生很强的相似性，但并不是真正的同源性。标准的重复屏蔽方法无法解决这个问题，特别是对于富含 AT 的 DNA，如疟疾和粘菌基因组。我们提出了一种新的重复屏蔽方法 TANTAN，它是受产生简单重复序列的机制启发而来的。这种方法可以彻底消除 DNA-DNA、蛋白质-蛋白质和 DNA-蛋白质比较中的虚假同源性预测。此外，它还可以实现对具有极端 A+T 组成的非编码 DNA 的精确同源性搜索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49e0/3045581/5cd397f04754/gkq1212f1.jpg

相似文献

A new repeat-masking method enables specific detection of homologous sequences.

Nucleic Acids Res. 2011 Mar;39(4):e23. doi: 10.1093/nar/gkq1212. Epub 2010 Nov 24.

HomologMiner: looking for homologous genomic groups in whole genomes.

Bioinformatics. 2007 Apr 15;23(8):917-25. doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18.

Indel seeds for homology search.

Bioinformatics. 2006 Jul 15;22(14):e341-9. doi: 10.1093/bioinformatics/btl263.

Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation.

Bioinformatics. 2004 Jun 12;20(9):1405-12. doi: 10.1093/bioinformatics/bth103. Epub 2004 Feb 19.

MREPATT: detection and analysis of exact consecutive repeats in genomic sequences.

Bioinformatics. 2003 Dec 12;19(18):2475-6. doi: 10.1093/bioinformatics/btg326.

GATA: a graphic alignment tool for comparative sequence analysis.

BMC Bioinformatics. 2005 Jan 17;6:9. doi: 10.1186/1471-2105-6-9.

OMWSA: detection of DNA repeats using moving window spectral analysis.

Bioinformatics. 2007 Mar 1;23(5):631-3. doi: 10.1093/bioinformatics/btm008. Epub 2007 Jan 31.

Masking repeats while clustering ESTs.

Nucleic Acids Res. 2005 Apr 14;33(7):2176-80. doi: 10.1093/nar/gki511. Print 2005.

Compact encoding strategies for DNA sequence similarity search.

Proc Int Conf Intell Syst Mol Biol. 1996;4:211-7.

Global multiple-sequence alignment with repeats.

Proteins. 2006 Jul 1;64(1):263-74. doi: 10.1002/prot.20957.

引用本文的文献

Hybrid genome assembly data and comparative genomics of isolated from infected blackberry fields.

Data Brief. 2025 Jul 9;61:111854. doi: 10.1016/j.dib.2025.111854. eCollection 2025 Aug.

NEAR: neural embeddings for amino acid relationships.

Bioinformatics. 2025 Jul 1;41(Supplement_1):i449-i457. doi: 10.1093/bioinformatics/btaf198.

De novo Genome Assembly and Annotation of 12 Fungi Associated with Fruit Tree Decline Syndrome in ON, Canada.

Sci Data. 2025 Jul 1;12(1):1098. doi: 10.1038/s41597-025-05192-5.

Characterisation of the genome and secretome of and .

IMA Fungus. 2025 Jun 10;16:e156195. doi: 10.3897/imafungus.16.156195. eCollection 2025.

Pytrf: a python package for finding tandem repeats from genomic sequences.

BMC Bioinformatics. 2025 Jun 4;26(1):151. doi: 10.1186/s12859-025-06168-3.

Are reads required? High-precision variant calling from bacterial genome assemblies.

Access Microbiol. 2025 May 28;7(5). doi: 10.1099/acmi.0.001025.v3. eCollection 2025.

Insights into optimization of oleaginous fungi - genome-scale metabolic reconstruction and analysis of sp. WA50703.

Comput Struct Biotechnol J. 2025 Apr 1;27:1431-1439. doi: 10.1016/j.csbj.2025.03.049. eCollection 2025.

Cryptic infection of a giant virus in a unicellular green alga.

Science. 2025 May 15;388(6748):eads6303. doi: 10.1126/science.ads6303.

REPrise: de novo interspersed repeat detection using inexact seeding.

Mob DNA. 2025 Apr 3;16(1):16. doi: 10.1186/s13100-025-00353-0.

Subcellular Enrichment Patterns of New Genes in Drosophila Evolution.

Mol Biol Evol. 2025 Feb 3;42(2). doi: 10.1093/molbev/msaf038.

本文引用的文献

More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology.

PLoS Comput Biol. 2010 Jul 29;6(7):e1000867. doi: 10.1371/journal.pcbi.1000867.

ESTIMATING THE GUMBEL SCALE PARAMETER FOR LOCAL ALIGNMENT OF RANDOM SEQUENCES BY IMPORTANCE SAMPLING WITH STOPPING TIMES.

Ann Stat. 2009 Dec 1;37(6A):3697. doi: 10.1214/08-AOS663.

Parameters for accurate genome alignment.

BMC Bioinformatics. 2010 Feb 9;11:80. doi: 10.1186/1471-2105-11-80.

Comparative genomics and molecular dynamics of DNA repeats in eukaryotes.

Microbiol Mol Biol Rev. 2008 Dec;72(4):686-727. doi: 10.1128/MMBR.00011-08.

Discovering regulatory motifs in the Plasmodium genome using comparative genomics.

Bioinformatics. 2008 Sep 1;24(17):1843-9. doi: 10.1093/bioinformatics/btn348. Epub 2008 Jul 8.

Detecting microsatellites within genomes: significant variation among algorithms.

BMC Bioinformatics. 2007 Apr 18;8:125. doi: 10.1186/1471-2105-8-125.

A fast and symmetric DUST implementation to mask low-complexity DNA sequences.

J Comput Biol. 2006 Jun;13(5):1028-40. doi: 10.1089/cmb.2006.13.1028.

The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions.

Bioinformatics. 2005 Apr 1;21(7):902-11. doi: 10.1093/bioinformatics/bti070. Epub 2004 Oct 27.

Modeling the percolation of annotation errors in a database of protein sequences.

Bioinformatics. 2002 Dec;18(12):1641-9. doi: 10.1093/bioinformatics/18.12.1641.

Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii.

Nature. 2002 Oct 3;419(6906):512-9. doi: 10.1038/nature01099.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种新的重复序列屏蔽方法可实现同源序列的特异性检测。

A new repeat-masking method enables specific detection of homologous sequences.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种新的重复序列屏蔽方法可实现同源序列的特异性检测。

A new repeat-masking method enables specific detection of homologous sequences.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献