双翅目昆虫中重复序列识别与屏蔽的改进

Improved repeat identification and masking in Dipterans.

作者信息

Smith Christopher D, Edgar Robert C, Yandell Mark D, Smith Douglas R, Celniker Susan E, Myers Eugene W, Karpen Gary H

机构信息

Department of Biology, San Francisco State University, San Francisco, CA, United States.

出版信息

Gene. 2007 Mar 1;389(1):1-9. doi: 10.1016/j.gene.2006.09.011. Epub 2006 Oct 12.

DOI:10.1016/j.gene.2006.09.011

PMID:17137733

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1945102/

Abstract

Repetitive sequences are a major constituent of many eukaryote genomes and play roles in gene regulation, chromosome inheritance, nuclear architecture, and genome stability. The identification of repetitive elements has traditionally relied on in-depth, manual curation and computational determination of close relatives based on DNA identity. However, the rapid divergence of repetitive sequence has made identification of repeats by DNA identity difficult even in closely related species. Hence, the presence of unidentified repeats in genome sequences affects the quality of gene annotations and annotation-dependent analyses (e.g. microarray analyses). We have developed an enhanced repeat identification pipeline using two approaches. First, the de novo repeat finding program PILER-DF was used to identify interspersed repetitive elements in several recently finished Dipteran genomes. Repeats were classified, when possible, according to their similarity to known elements described in Repbase and GenBank, and also screened against annotated genes as one means of eliminating false positives. Second, we used a new program called RepeatRunner, which integrates results from both RepeatMasker nucleotide searches and protein searches using BLASTX. Using RepeatRunner with PILER-DF predictions, we masked repeats in thirteen Dipteran genomes and conclude that combining PILER-DF and RepeatRunner greatly enhances repeat identification in both well-characterized and un-annotated genomes.

摘要

重复序列是许多真核生物基因组的主要组成部分，在基因调控、染色体遗传、核结构和基因组稳定性中发挥作用。传统上，重复元件的鉴定依赖于基于DNA同一性的深入手动整理和近缘物种的计算确定。然而，重复序列的快速分化使得即使在亲缘关系密切的物种中，通过DNA同一性鉴定重复序列也变得困难。因此，基因组序列中未鉴定重复序列的存在会影响基因注释的质量以及依赖注释的分析（例如微阵列分析）。我们使用两种方法开发了一种增强的重复序列鉴定流程。首先，使用从头重复序列发现程序PILER-DF来鉴定几个最近完成的双翅目基因组中的散布重复元件。可能的话，根据它们与Repbase和GenBank中描述的已知元件的相似性对重复序列进行分类，并针对注释基因进行筛选，作为消除假阳性的一种手段。其次，我们使用了一个名为RepeatRunner的新程序，该程序整合了RepeatMasker核苷酸搜索和使用BLASTX的蛋白质搜索结果。将RepeatRunner与PILER-DF预测结果结合使用，我们对13个双翅目基因组中的重复序列进行了屏蔽，并得出结论，将PILER-DF和RepeatRunner结合使用可以大大增强在特征明确和未注释基因组中的重复序列鉴定。

相似文献

Improved repeat identification and masking in Dipterans.

Gene. 2007 Mar 1;389(1):1-9. doi: 10.1016/j.gene.2006.09.011. Epub 2006 Oct 12.

Identification and annotation of repetitive sequences in fungal genomes.

Methods Mol Biol. 2011;722:33-50. doi: 10.1007/978-1-61779-040-9_3.

Identification of transposable elements using multiple alignments of related genomes.

Genome Res. 2006 Feb;16(2):260-70. doi: 10.1101/gr.4361206. Epub 2005 Dec 14.

The landscape of transposable elements in the finished genome of the fungal wheat pathogen Mycosphaerella graminicola.

BMC Genomics. 2014 Dec 17;15(1):1132. doi: 10.1186/1471-2164-15-1132.

Repetitive elements may comprise over two-thirds of the human genome.

PLoS Genet. 2011 Dec;7(12):e1002384. doi: 10.1371/journal.pgen.1002384. Epub 2011 Dec 1.

Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor.

BMC Bioinformatics. 2006 Oct 25;7:474. doi: 10.1186/1471-2105-7-474.

HomologMiner: looking for homologous genomic groups in whole genomes.

Bioinformatics. 2007 Apr 15;23(8):917-25. doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18.

PILER-CR: fast and accurate identification of CRISPR repeats.

BMC Bioinformatics. 2007 Jan 20;8:18. doi: 10.1186/1471-2105-8-18.

Role of transposable elements in heterochromatin and epigenetic control.

Nature. 2004 Jul 22;430(6998):471-6. doi: 10.1038/nature02651.

Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats.

BMC Genomics. 2008 Oct 31;9:518. doi: 10.1186/1471-2164-9-518.

引用本文的文献

Genomic studies in Linum shed light on the evolution of the distyly supergene and the molecular basis of convergent floral evolution.

New Phytol. 2025 Sep;247(6):2964-2981. doi: 10.1111/nph.70392. Epub 2025 Jul 18.

Whole-genome assembly of and .

Microbiol Resour Announc. 2025 Jul 10;14(7):e0103324. doi: 10.1128/mra.01033-24. Epub 2025 Jun 18.

Chromosome-scale scaffolds of the fungus gnat genome reveal multi-Mb-scale chromosome-folding interactions, centromeric enrichments of retrotransposons, and candidate telomere sequences.

BMC Genomics. 2025 May 5;26(1):443. doi: 10.1186/s12864-025-11573-2.

The Acrasis kona genome and developmental transcriptomes reveal deep origins of eukaryotic multicellular pathways.

Nat Commun. 2024 Nov 25;15(1):10197. doi: 10.1038/s41467-024-54029-z.

Whole genome sequence of the deep-sea sponge Geodia barretti (Metazoa, Porifera, Demospongiae).

G3 (Bethesda). 2023 Sep 30;13(10). doi: 10.1093/g3journal/jkad192.

High-quality genome assembly of VD991 allows for screening and validation of pathogenic genes.

Front Microbiol. 2023 May 31;14:1177078. doi: 10.3389/fmicb.2023.1177078. eCollection 2023.

Annotated genome sequence of a fast-growing diploid clone of red alder (Alnus rubra Bong.).

G3 (Bethesda). 2023 Jun 1;13(6). doi: 10.1093/g3journal/jkad060.

Characterization of transposable elements within the Bemisia tabaci species complex.

Mob DNA. 2022 Apr 19;13(1):12. doi: 10.1186/s13100-022-00270-6.

The bowfin genome illuminates the developmental evolution of ray-finned fishes.

Nat Genet. 2021 Sep;53(9):1373-1384. doi: 10.1038/s41588-021-00914-y. Epub 2021 Aug 30.

Draft Genome Sequence of Oleaginous Yeast sp. Strain JCM 24511, Isolated from Soil on Iriomote Island, Okinawa, Japan.

Microbiol Resour Announc. 2020 Nov 25;9(48):e00196-20. doi: 10.1128/MRA.00196-20.

本文引用的文献

RNA interference has a role in regulating Drosophila telomeres.

Genome Biol. 2006;7(5):220. doi: 10.1186/gb-2006-7-5-220. Epub 2006 May 31.

A distal enhancer and an ultraconserved exon are derived from a novel retroposon.

Nature. 2006 May 4;441(7089):87-90. doi: 10.1038/nature04696. Epub 2006 Apr 16.

Large-scale trends in the evolution of gene structures within 11 animal genomes.

PLoS Comput Biol. 2006 Mar;2(3):e15. doi: 10.1371/journal.pcbi.0020015. Epub 2006 Mar 3.

Transposable elements have contributed to thousands of human proteins.

Proc Natl Acad Sci U S A. 2006 Feb 7;103(6):1798-803. doi: 10.1073/pnas.0510007103. Epub 2006 Jan 27.

Transposable elements as a significant source of transcription regulating signals.

Gene. 2006 Jan 3;365:104-10. doi: 10.1016/j.gene.2005.09.036. Epub 2006 Jan 10.

Identification of transposable elements using multiple alignments of related genomes.

Genome Res. 2006 Feb;16(2):260-70. doi: 10.1101/gr.4361206. Epub 2005 Dec 14.

Combined evidence annotation of transposable elements in genome sequences.

PLoS Comput Biol. 2005 Jul;1(2):166-75. doi: 10.1371/journal.pcbi.0010022. Epub 2005 Jul 29.

Repbase Update, a database of eukaryotic repetitive elements.

Cytogenet Genome Res. 2005;110(1-4):462-7. doi: 10.1159/000084979.

RNA meets chromatin.

Genes Dev. 2005 Jul 15;19(14):1635-55. doi: 10.1101/gad.1324305.

PILER: identification and classification of genomic repeats.

Bioinformatics. 2005 Jun;21 Suppl 1:i152-8. doi: 10.1093/bioinformatics/bti1003.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

双翅目昆虫中重复序列识别与屏蔽的改进

Improved repeat identification and masking in Dipterans.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献