• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用短读长和长读长的混合从头串联重复序列检测

Hybrid de novo tandem repeat detection using short and long reads.

作者信息

Fertin Guillaume, Jean Géraldine, Radulescu Andreea, Rusu Irena

出版信息

BMC Med Genomics. 2015;8 Suppl 3(Suppl 3):S5. doi: 10.1186/1755-8794-8-S3-S5. Epub 2015 Sep 23.

DOI:10.1186/1755-8794-8-S3-S5
PMID:26399998
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4582210/
Abstract

BACKGROUND

As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%.

METHODS

In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies.

RESULTS

MixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns.

CONCLUSIONS

Our method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.

摘要

背景

串联重复序列作为研究最多的基因组重排之一,对遗传性疾病的遗传背景有相当大的影响。许多针对参考序列上串联重复序列检测设计的方法都能获得高质量的结果。然而,在没有参考序列的从头背景下,串联重复序列检测仍然是一个难题。第二代测序方法获得的短读长不足以跨越包含长重复序列的区域。第三代测序平台(如太平洋生物科学公司的技术)获得的长读长解决了这一长度限制问题。然而,读长的增加伴随着错误率的显著上升。当今长读长研究的主要目标是处理高达16%的高错误率。

方法

在本文中,我们提出了MixTaR,这是第一种用于串联重复序列检测的从头方法,它结合了短读长的高质量和长读长的大长度。我们的混合算法基于德布鲁因图使用短读长集进行串联重复序列模式检测。然后使用长读长验证这些模式,并使用局部贪婪组装构建串联重复序列。

结果

MixTaR使用来自复杂生物体的模拟和真实读长进行了测试。为了全面分析其对错误的鲁棒性,我们使用了具有不同错误率的短读长和长读长。然后根据检测到的串联重复序列数量及其模式长度对结果进行分析。

结论

我们的方法显示出高精度和高灵敏度。即使对于高度错误的读长,MixTaR的误报率也很低,能够检测出模式长度在很大区间内变化的准确串联重复序列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/dc2e1bd8b55c/1755-8794-8-S3-S5-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/fb80cbd75ed0/1755-8794-8-S3-S5-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/1d811becd5fc/1755-8794-8-S3-S5-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/3158e7690d8f/1755-8794-8-S3-S5-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/4a6e080964ed/1755-8794-8-S3-S5-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/d709fe241a4f/1755-8794-8-S3-S5-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/8a075578a673/1755-8794-8-S3-S5-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/97b73a7edfc6/1755-8794-8-S3-S5-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/afa4e2ce4421/1755-8794-8-S3-S5-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/3483553f41c3/1755-8794-8-S3-S5-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/2ec476615d4a/1755-8794-8-S3-S5-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/edac1a239ace/1755-8794-8-S3-S5-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/286a9b3cfda2/1755-8794-8-S3-S5-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/dc2e1bd8b55c/1755-8794-8-S3-S5-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/fb80cbd75ed0/1755-8794-8-S3-S5-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/1d811becd5fc/1755-8794-8-S3-S5-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/3158e7690d8f/1755-8794-8-S3-S5-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/4a6e080964ed/1755-8794-8-S3-S5-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/d709fe241a4f/1755-8794-8-S3-S5-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/8a075578a673/1755-8794-8-S3-S5-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/97b73a7edfc6/1755-8794-8-S3-S5-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/afa4e2ce4421/1755-8794-8-S3-S5-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/3483553f41c3/1755-8794-8-S3-S5-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/2ec476615d4a/1755-8794-8-S3-S5-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/edac1a239ace/1755-8794-8-S3-S5-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/286a9b3cfda2/1755-8794-8-S3-S5-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/982f/4582210/dc2e1bd8b55c/1755-8794-8-S3-S5-13.jpg

相似文献

1
Hybrid de novo tandem repeat detection using short and long reads.使用短读长和长读长的混合从头串联重复序列检测
BMC Med Genomics. 2015;8 Suppl 3(Suppl 3):S5. doi: 10.1186/1755-8794-8-S3-S5. Epub 2015 Sep 23.
2
De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application.应用 dnaasm 对具有重复 DNA 区域的细菌基因组进行从头组装。
BMC Bioinformatics. 2018 Jul 18;19(1):273. doi: 10.1186/s12859-018-2281-4.
3
RF: a method for filtering short reads with tandem repeats for genome mapping.RF:一种用于基因组图谱构建的带有串联重复的短读过滤方法。
Genomics. 2013 Jul;102(1):35-7. doi: 10.1016/j.ygeno.2013.03.002. Epub 2013 Mar 29.
4
Finding long tandem repeats in long noisy reads.在长噪声读取中查找长串联重复。
Bioinformatics. 2021 May 5;37(5):612-621. doi: 10.1093/bioinformatics/btaa865.
5
Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.单轮循环器:从短读长和长读长测序数据中解析细菌基因组组装结果
PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun.
6
SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome.甜菜(Beta vulgaris)叶绿体基因组的单分子实时测序从头组装
BMC Bioinformatics. 2015 Sep 16;16(1):295. doi: 10.1186/s12859-015-0726-6.
7
Jabba: hybrid error correction for long sequencing reads.贾巴:针对长测序读段的混合错误校正。
Algorithms Mol Biol. 2016 May 3;11:10. doi: 10.1186/s13015-016-0075-7. eCollection 2016.
8
RepLong: de novo repeat identification using long read sequencing data.RepLong:利用长读测序数据进行从头重复识别。
Bioinformatics. 2018 Apr 1;34(7):1099-1107. doi: 10.1093/bioinformatics/btx717.
9
Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.使用变阶 de Bruijn 图对高度嘈杂的长读进行混合纠错。
Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.
10
TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain.TideHunter:使用种子和链在嘈杂的长读取中高效且敏感的串联重复检测。
Bioinformatics. 2019 Jul 15;35(14):i200-i207. doi: 10.1093/bioinformatics/btz376.

引用本文的文献

1
RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.RepAHR:通过组装高频读段进行从头鉴定重复序列的改进方法。
BMC Bioinformatics. 2020 Oct 19;21(1):463. doi: 10.1186/s12859-020-03779-w.
2
A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder.蛋白质串联重复及其与固有无序性的关系的新普查。
Genes (Basel). 2020 Apr 9;11(4):407. doi: 10.3390/genes11040407.
3
GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing.

本文引用的文献

1
LoRDEC: accurate and efficient long read error correction.LoRDEC:准确高效的长读错误纠正。
Bioinformatics. 2014 Dec 15;30(24):3506-14. doi: 10.1093/bioinformatics/btu538. Epub 2014 Aug 26.
2
Resolving complex tandem repeats with long reads.用长读解决复杂串联重复序列
Bioinformatics. 2014 Dec 15;30(24):3491-8. doi: 10.1093/bioinformatics/btu437. Epub 2014 Jul 15.
3
proovread: large-scale high-accuracy PacBio correction through iterative short read consensus.Proovread:通过迭代短读共识实现大规模高精度 PacBio 校正。
GtTR:基于序列捕获和高通量测序的绝对串联重复拷贝数的贝叶斯估计。
BMC Bioinformatics. 2018 Jul 16;19(1):267. doi: 10.1186/s12859-018-2282-3.
4
Satellite DNA evolution: old ideas, new approaches.卫星 DNA 进化:旧观念,新方法。
Curr Opin Genet Dev. 2018 Apr;49:70-78. doi: 10.1016/j.gde.2018.03.003. Epub 2018 Mar 23.
5
TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads.TAREAN:一种用于从未组装的短读段中鉴定和表征卫星DNA的计算工具。
Nucleic Acids Res. 2017 Jul 7;45(12):e111. doi: 10.1093/nar/gkx257.
Bioinformatics. 2014 Nov 1;30(21):3004-11. doi: 10.1093/bioinformatics/btu392. Epub 2014 Jul 10.
4
GATB: Genome Assembly & Analysis Tool Box.GATB:基因组组装与分析工具包。
Bioinformatics. 2014 Oct 15;30(20):2959-61. doi: 10.1093/bioinformatics/btu406. Epub 2014 Jul 1.
5
ExSPAnder: a universal repeat resolver for DNA fragment assembly.ExSPAnder:一种用于 DNA 片段组装的通用重复序列解析器。
Bioinformatics. 2014 Jun 15;30(12):i293-301. doi: 10.1093/bioinformatics/btu266.
6
Reconstructing complex regions of genomes using long-read sequencing technology.使用长读长测序技术重建基因组的复杂区域。
Genome Res. 2014 Apr;24(4):688-96. doi: 10.1101/gr.168450.113. Epub 2014 Jan 13.
7
Genome-wide analysis of tandem repeats in plants and green algae.植物和绿藻串联重复序列的全基因组分析。
G3 (Bethesda). 2014 Jan 10;4(1):67-78. doi: 10.1534/g3.113.008524.
8
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.非杂交、基于长读长 SMRT 测序数据的完成微生物基因组组装。
Nat Methods. 2013 Jun;10(6):563-9. doi: 10.1038/nmeth.2474. Epub 2013 May 5.
9
RF: a method for filtering short reads with tandem repeats for genome mapping.RF:一种用于基因组图谱构建的带有串联重复的短读过滤方法。
Genomics. 2013 Jul;102(1):35-7. doi: 10.1016/j.ygeno.2013.03.002. Epub 2013 Mar 29.
10
PBSIM: PacBio reads simulator--toward accurate genome assembly.PBSIM:PacBio reads 模拟器——实现更精确的基因组组装。
Bioinformatics. 2013 Jan 1;29(1):119-21. doi: 10.1093/bioinformatics/bts649. Epub 2012 Nov 4.