• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

水稻基因组的转座元件注释

Transposable element annotation of the rice genome.

作者信息

Juretic Nikoleta, Bureau Thomas E, Bruskiewich Richard M

机构信息

Department of Biology, McGill University, Montreal, Quebec, H3A 1B1 Canada.

出版信息

Bioinformatics. 2004 Jan 22;20(2):155-60. doi: 10.1093/bioinformatics/bth019.

DOI:10.1093/bioinformatics/bth019
PMID:14734305
Abstract

MOTIVATION

The high content of repetitive sequences in the genomes of many higher eukaryotes renders the task of annotating them computationally intensive. Presently, the only widely accepted method of searching and annotating transposable elements (TEs) in large genomic sequences is the use of the RepeatMasker program, which identifies new copies of TEs by pairwise sequence comparisons with a library of known TEs. Profile hidden Markov models (HMMs) have been used successfully in discovering distant homologs of known proteins in large protein databases, but this approach has only rarely been applied to known model TE families in genomic DNA.

RESULTS

We used a combination of computational approaches to annotate the TEs in the finished genome of Oryza sativa ssp. japonica. In this paper, we discuss the strengths and the weaknesses of the annotation methods used. These approaches included: the default configuration of RepeatMasker using cross_match, an implementation of the Smith-Waterman-Gotoh algorithm; RepeatMasker using WU-BLAST for similarity searching; and the HMMER package, used to search for TEs with profile HMMs. All the results were converted into GFF format and post-processed using a set of Perl scripts. RepeatMasker was used in the case of most TE families. The WU-BLAST implementation of RepeatMasker was found to be manifold faster than cross_match with only a slight loss in sensitivity and was thus used to obtain the final set of data. HMMER was used in the annotation of the Mutator-like element (MULE) superfamily and the miniature inverted-repeat transposable element (MITE) polyphyletic group of families, for which large libraries of elements were available and which could be divided into well-defined families. The HMMER search algorithm was extremely slow for models over 1000 bp in length, so MULE families with members over 1000 bp long were processed with RepeatMasker instead. The main disadvantage of HMMER in this application is that, since it was developed with protein sequences in mind, it does not search the negative DNA strand. With the exception of TE families with essentially palindromic sequences, reverse complement models had to be created and run to compensate for this shortcoming. We conclude that a modification of RepeatMasker to incorporate libraries of profile HMMs in searches could improve the ability to detect degenerated copies of TEs.

AVAILABILITY

The Perl scripts and TE sequences used in construction of the RepeatMasker library and the profile HMMs are available upon request.

摘要

动机

许多高等真核生物基因组中重复序列的高含量使得对其进行计算注释的任务非常耗时。目前,在大型基因组序列中搜索和注释转座元件(TEs)的唯一广泛接受的方法是使用RepeatMasker程序,该程序通过与已知TEs库进行成对序列比较来识别TEs的新拷贝。轮廓隐马尔可夫模型(HMMs)已成功用于在大型蛋白质数据库中发现已知蛋白质的远源同源物,但这种方法很少应用于基因组DNA中的已知模型TE家族。

结果

我们使用了多种计算方法对水稻粳稻亚种的完成基因组中的TEs进行注释。在本文中,我们讨论了所使用注释方法的优缺点。这些方法包括:使用cross_match的RepeatMasker默认配置,这是Smith-Waterman-Gotoh算法的一种实现;使用WU-BLAST进行相似性搜索的RepeatMasker;以及用于使用轮廓HMM搜索TEs的HMMER软件包。所有结果都转换为GFF格式,并使用一组Perl脚本进行后处理。大多数TE家族的注释使用了RepeatMasker。发现RepeatMasker的WU-BLAST实现比cross_match快得多,只是灵敏度略有损失,因此用于获取最终数据集。HMMER用于注释类Mutator元件(MULE)超家族和微型反向重复转座元件(MITE)多系家族组,对于这些家族有大量的元件库,并且可以分为定义明确的家族。对于长度超过1000 bp的模型,HMMER搜索算法极其缓慢,因此长度超过1000 bp的MULE家族成员用RepeatMasker进行处理。HMMER在该应用中的主要缺点是,由于它是考虑蛋白质序列开发的,因此它不搜索负链DNA。除了基本为回文序列的TE家族外,必须创建并运行反向互补模型来弥补这一缺点。我们得出结论,对RepeatMasker进行修改以在搜索中纳入轮廓HMM库可以提高检测TEs退化拷贝的能力。

可用性

构建RepeatMasker库和轮廓HMM时使用的Perl脚本和TE序列可应要求提供。

相似文献

1
Transposable element annotation of the rice genome.水稻基因组的转座元件注释
Bioinformatics. 2004 Jan 22;20(2):155-60. doi: 10.1093/bioinformatics/bth019.
2
MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences.MITE-Hunter:从基因组序列中发现微型反向重复转座元件的程序。
Nucleic Acids Res. 2010 Dec;38(22):e199. doi: 10.1093/nar/gkq862. Epub 2010 Sep 29.
3
Analysis of transposable element sequences using CENSOR and RepeatMasker.使用CENSOR和RepeatMasker对转座元件序列进行分析。
Methods Mol Biol. 2009;537:323-36. doi: 10.1007/978-1-59745-251-9_16.
4
Rice transposable elements: a survey of 73,000 sequence-tagged-connectors.水稻转座元件:对73000个序列标签连接子的调查
Genome Res. 2000 Jul;10(7):982-90. doi: 10.1101/gr.10.7.982.
5
ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun.ReAS:从全基因组鸟枪法测序的未组装读段中恢复转座元件的祖先序列。
PLoS Comput Biol. 2005 Sep;1(4):e43. doi: 10.1371/journal.pcbi.0010043. Epub 2005 Sep 23.
6
RepeatModeler2 for automated genomic discovery of transposable element families.RepeatModeler2 用于自动发现转座元件家族的基因组。
Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9451-9457. doi: 10.1073/pnas.1921046117. Epub 2020 Apr 16.
7
Dfam: a database of repetitive DNA based on profile hidden Markov models.Dfam:基于隐马尔可夫模型的重复 DNA 数据库。
Nucleic Acids Res. 2013 Jan;41(Database issue):D70-82. doi: 10.1093/nar/gks1265. Epub 2012 Nov 30.
8
Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome.鉴定、描述和亚麻(Linum usitatissimum L.)基因组中转座元件的分布。
BMC Genomics. 2012 Nov 21;13:644. doi: 10.1186/1471-2164-13-644.
9
Search for SINE repeats in the rice genome using correlation-based position weight matrices.利用基于相关性的位置权重矩阵在水稻基因组中搜索 SINE 重复序列。
BMC Bioinformatics. 2021 Feb 2;22(1):42. doi: 10.1186/s12859-021-03977-0.
10
Discovering and detecting transposable elements in genome sequences.在基因组序列中发现和检测转座元件。
Brief Bioinform. 2007 Nov;8(6):382-92. doi: 10.1093/bib/bbm048. Epub 2007 Oct 10.

引用本文的文献

1
Transposable element abundance correlates with mode of transmission in microsporidian parasites.转座元件丰度与微孢子虫寄生虫的传播方式相关。
Mob DNA. 2020 Jun 23;11:19. doi: 10.1186/s13100-020-00218-8. eCollection 2020.
2
Transgenerational epimutations induced by multi-generation drought imposition mediate rice plant's adaptation to drought condition.多代干旱胁迫诱导的跨代表观突变介导水稻植物适应干旱条件。
Sci Rep. 2017 Jan 4;7:39843. doi: 10.1038/srep39843.
3
Organization and evolution of transposable elements along the bread wheat chromosome 3B.
面包小麦3B染色体上转座元件的组织与进化
Genome Biol. 2014;15(12):546. doi: 10.1186/s13059-014-0546-4.
4
Annotation and sequence diversity of transposable elements in common bean (Phaseolus vulgaris).普通菜豆(Phaseolus vulgaris)中转座元件的注释及序列多样性
Front Plant Sci. 2014 Jul 11;5:339. doi: 10.3389/fpls.2014.00339. eCollection 2014.
5
Contribution of large genomic rearrangements in Italian Lynch syndrome patients: characterization of a novel alu-mediated deletion.意大利林奇综合征患者中大片段基因组重排的贡献:一种新型 Alu 介导缺失的特征。
Biomed Res Int. 2013;2013:219897. doi: 10.1155/2013/219897. Epub 2012 Dec 30.
6
Abundant degenerate miniature inverted-repeat transposable elements in genomes of epichloid fungal endophytes of grasses.丰富的退化微型反向重复转座元件存在于草类真菌内生菌的基因组中。
Genome Biol Evol. 2011;3:1253-64. doi: 10.1093/gbe/evr098. Epub 2011 Sep 26.
7
T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data.T-lex:一种使用下一代测序数据快速准确评估转座元件存在的程序。
Nucleic Acids Res. 2011 Mar;39(6):e36. doi: 10.1093/nar/gkq1291. Epub 2010 Dec 21.
8
Genomewide SNP variation reveals relationships among landraces and modern varieties of rice.全基因组单核苷酸多态性变异揭示了水稻地方品种与现代品种之间的关系。
Proc Natl Acad Sci U S A. 2009 Jul 28;106(30):12273-8. doi: 10.1073/pnas.0900992106. Epub 2009 Jul 13.
9
Organization and evolution of two SIDER retroposon subfamilies and their impact on the Leishmania genome.两个SIDER反转座子亚家族的组织与进化及其对利什曼原虫基因组的影响。
BMC Genomics. 2009 May 22;10:240. doi: 10.1186/1471-2164-10-240.
10
Automated paleontology of repetitive DNA with REANNOTATE.使用REANNOTATE进行重复DNA的自动化古生物学研究。
BMC Genomics. 2008 Dec 18;9:614. doi: 10.1186/1471-2164-9-614.