• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种从短序列读段中重构一致重复序列的改进方法。

An improved approach for reconstructing consensus repeats from short sequence reads.

机构信息

Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, 02115, MA, USA.

Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Way, Unit 2155, Storrs, 06269, CT, USA.

出版信息

BMC Genomics. 2018 Aug 13;19(Suppl 6):566. doi: 10.1186/s12864-018-4920-6.

DOI:10.1186/s12864-018-4920-6
PMID:30367582
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6101065/
Abstract

BACKGROUND

Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn't perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method.

RESULTS

We compare the performance of the new method with REPdenovo and RepARK on Human, Arabidopsis thaliana and Drosophila melanogaster short sequencing data. And the new method fully constructs more repeats in Repbase than the original REPdenovo and RepARK, especially for repeats of higher divergence rates and lower copy number. We also apply our new method on Hummingbird data which doesn't have a known repeat library, and it constructs many repeat elements that can be validated using PacBio long reads.

CONCLUSION

We propose an improved method for reconstructing repeat elements directly from short sequence reads. The results show that our new method can assemble more complete repeats than REPdenovo (and also RepARK). Our new approach has been implemented as part of the REPdenovo software package, which is available for download at https://github.com/Reedwarbler/REPdenovo .

摘要

背景

重复元件是大多数真核生物基因组的重要组成部分。大多数现有的重复分析工具要么依赖于高质量的参考基因组,要么依赖于现有的重复文库。因此,对于具有高度重复或复杂基因组的物种,进行重复分析仍然具有挑战性,这些物种通常没有良好的参考基因组或注释的重复文库。最近,我们开发了一种称为 REPdenovo 的计算方法,该方法可以直接从短序列读取构建一致的重复序列,其性能优于称为 RepARK 的现有工具。REPdenovo 的一个主要问题是,对于具有相对较高分歧率或低拷贝数的重复,其性能不佳。在本文中,我们提出了一种从短读取直接构建一致重复的改进方法。与原始的 REPdenovo 相比,改进的方法使用了更多与重复相关的 k-mer,并使用基于一致的 k-mer 处理方法来提高重复组装质量。

结果

我们将新方法与 REPdenovo 和 RepARK 在人类、拟南芥和果蝇短测序数据上的性能进行了比较。与原始的 REPdenovo 和 RepARK 相比,新方法在 Repbase 中完全构建了更多的重复,尤其是对于分歧率较高和拷贝数较低的重复。我们还将我们的新方法应用于 Hummingbird 数据,该数据没有已知的重复文库,它构建了许多可以使用 PacBio 长读取进行验证的重复元素。

结论

我们提出了一种从短序列读取直接重建重复元件的改进方法。结果表明,我们的新方法可以比 REPdenovo(和 RepARK)组装更完整的重复。我们的新方法已作为 REPdenovo 软件包的一部分实现,可在 https://github.com/Reedwarbler/REPdenovo 上下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/64bb7d435577/12864_2018_4920_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/7c7e119d179f/12864_2018_4920_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/f3c801b20ae9/12864_2018_4920_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/6aa2b24c74a2/12864_2018_4920_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/f3cd82c986d4/12864_2018_4920_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/64bb7d435577/12864_2018_4920_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/7c7e119d179f/12864_2018_4920_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/f3c801b20ae9/12864_2018_4920_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/6aa2b24c74a2/12864_2018_4920_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/f3cd82c986d4/12864_2018_4920_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfb4/6101065/64bb7d435577/12864_2018_4920_Fig5_HTML.jpg

相似文献

1
An improved approach for reconstructing consensus repeats from short sequence reads.一种从短序列读段中重构一致重复序列的改进方法。
BMC Genomics. 2018 Aug 13;19(Suppl 6):566. doi: 10.1186/s12864-018-4920-6.
2
REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.REPdenovo:从短序列读取中推断从头重复基序
PLoS One. 2016 Mar 15;11(3):e0150719. doi: 10.1371/journal.pone.0150719. eCollection 2016.
3
RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.RepAHR:通过组装高频读段进行从头鉴定重复序列的改进方法。
BMC Bioinformatics. 2020 Oct 19;21(1):463. doi: 10.1186/s12859-020-03779-w.
4
GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads.GAPPadder:一种使用短序列读长来闭合草图基因组缺口的灵敏方法。
BMC Genomics. 2019 Jun 6;20(Suppl 5):426. doi: 10.1186/s12864-019-5703-4.
5
RepARK--de novo creation of repeat libraries from whole-genome NGS reads.RepARK——从头创建来自全基因组 NGS 读取的重复文库。
Nucleic Acids Res. 2014 May;42(9):e80. doi: 10.1093/nar/gku210. Epub 2014 Mar 14.
6
RepLong: de novo repeat identification using long read sequencing data.RepLong:利用长读测序数据进行从头重复识别。
Bioinformatics. 2018 Apr 1;34(7):1099-1107. doi: 10.1093/bioinformatics/btx717.
7
A sensitive repeat identification framework based on short and long reads.基于短读长读的敏感重复序列识别框架。
Nucleic Acids Res. 2021 Sep 27;49(17):e100. doi: 10.1093/nar/gkab563.
8
Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.Illumina 纠错技术在高度重复 DNA 区域的应用提高了从头基因组组装的质量。
BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2.
9
Deep repeat resolution-the assembly of the Drosophila Histone Complex.深度重复分辨率-果蝇组蛋白复合物的组装。
Nucleic Acids Res. 2019 Feb 20;47(3):e18. doi: 10.1093/nar/gky1194.
10
De novo reconstruction of satellite repeat units from sequence data.从头开始从序列数据重建卫星重复单元。
Genome Res. 2023 Dec 1;33(11):1994-2001. doi: 10.1101/gr.278005.123.

引用本文的文献

1
A draft genome assembly for the dart-poison frog .箭毒蛙的基因组组装草图。
GigaByte. 2025 Jun 20;2025:gigabyte157. doi: 10.46471/gigabyte.157. eCollection 2025.
2
DeviaTE: Assembly-free analysis and visualization of mobile genetic element composition.DeviaTE:无需组装的移动遗传元件组成分析和可视化。
Mol Ecol Resour. 2019 Sep;19(5):1346-1354. doi: 10.1111/1755-0998.13030. Epub 2019 Jul 3.

本文引用的文献

1
Phased diploid genome assembly with single-molecule real-time sequencing.基于单分子实时测序的阶段性二倍体基因组组装
Nat Methods. 2016 Dec;13(12):1050-1054. doi: 10.1038/nmeth.4035. Epub 2016 Oct 17.
2
phRAIDER: Pattern-Hunter based Rapid Ab Initio Detection of Elementary Repeats.phRAIDER:基于模式搜索的基本重复序列快速从头检测
Bioinformatics. 2016 Jun 15;32(12):i209-i215. doi: 10.1093/bioinformatics/btw258.
3
REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.REPdenovo:从短序列读取中推断从头重复基序
PLoS One. 2016 Mar 15;11(3):e0150719. doi: 10.1371/journal.pone.0150719. eCollection 2016.
4
Saccharina genomes provide novel insight into kelp biology.糖海带基因组为海带生物学提供了新的见解。
Nat Commun. 2015 Apr 24;6:6986. doi: 10.1038/ncomms7986.
5
The UCSC Genome Browser database: 2015 update.加州大学圣克鲁兹分校基因组浏览器数据库:2015年更新
Nucleic Acids Res. 2015 Jan;43(Database issue):D670-81. doi: 10.1093/nar/gku1177. Epub 2014 Nov 26.
6
RepARK--de novo creation of repeat libraries from whole-genome NGS reads.RepARK——从头创建来自全基因组 NGS 读取的重复文库。
Nucleic Acids Res. 2014 May;42(9):e80. doi: 10.1093/nar/gku210. Epub 2014 Mar 14.
7
Dfam: a database of repetitive DNA based on profile hidden Markov models.Dfam:基于隐马尔可夫模型的重复 DNA 数据库。
Nucleic Acids Res. 2013 Jan;41(Database issue):D70-82. doi: 10.1093/nar/gks1265. Epub 2012 Nov 30.
8
An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。
Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.
9
Integrative genomics viewer.整合基因组浏览器。
Nat Biotechnol. 2011 Jan;29(1):24-6. doi: 10.1038/nbt.1754.
10
The impact of retrotransposons on human genome evolution.逆转录转座子对人类基因组进化的影响。
Nat Rev Genet. 2009 Oct;10(10):691-703. doi: 10.1038/nrg2640.