短串联重复序列基因分型工具在全外显子组测序数据中的准确性。

Accuracy of short tandem repeats genotyping tools in whole exome sequencing data.

机构信息

Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, 3052, Australia.

Peter MacCallum Cancer Centre, 305 Grattan St, Melbourne, VIC, 3000, Australia.

出版信息

F1000Res. 2020 Mar 23;9:200. doi: 10.12688/f1000research.22639.1. eCollection 2020.

DOI:10.12688/f1000research.22639.1

PMID:32665844

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7327730/

Abstract

Short tandem repeats are an important source of genetic variation. They are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington's disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale; however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits, which will aid other researchers in choosing a suitable tool and parameters for analysis. The analysis was performed on the Simons Simplex Collection dataset, where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male samples. In total we analysed 433 samples and around a million genotypes for evaluating tools on whole exome sequencing data. We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length, which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool, while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. All tools have different strengths and weaknesses and the choice may depend on the application. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls.

摘要

短串联重复序列是遗传变异的重要来源。它们高度易变，重复扩展与数十种人类疾病有关，如亨廷顿病和脊髓小脑共济失调。测序技术的技术优势使其能够大规模分析这些重复序列；然而，准确的基因分型仍然是一项具有挑战性的任务。我们比较了四种不同的短串联重复序列基因分型工具在全外显子组测序数据上的表现，以确定它们的基因分型性能和限制，这将有助于其他研究人员选择合适的工具和分析参数。该分析是在西蒙斯单倍型集合数据集上进行的，我们使用了一种新的评估方法，其准确性通过男性样本 X 染色体上纯合子调用的比率来确定。我们总共分析了 433 个样本和大约 100 万个基因型，以评估全外显子组测序数据上的工具。我们确定了所有工具在基因分型 3-6bp 长度的重复序列时具有相对较好的性能，通过覆盖度和质量分数过滤可以提高其性能。然而，所有工具在基因分型同聚体时都具有挑战性，并且在不同的覆盖度和质量分数阈值下都存在高错误率。有趣的是，二核苷酸重复序列也显示出高错误率，这主要是由 AC/TG 重复序列引起的。总体而言，LobSTR 能够做出最多的调用，并且是最快的工具，而 RepeatSeq 和 HipSTR 在低覆盖度下表现出最低的杂合错误率。所有工具都有不同的优缺点，选择可能取决于应用。在本分析中，我们展示了使用不同过滤参数的效果，并根据基因分型最佳准确性和调用数量最高之间的权衡提供了建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed77/7327730/670d3495479c/f1000research-9-24995-g0000.jpg

相似文献

Accuracy of short tandem repeats genotyping tools in whole exome sequencing data.短串联重复序列基因分型工具在全外显子组测序数据中的准确性。

F1000Res. 2020 Mar 23;9:200. doi: 10.12688/f1000research.22639.1. eCollection 2020.

Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions.全基因组测序作为短串联重复扩展的一线筛查试验。

Genome Med. 2021 Aug 9;13(1):126. doi: 10.1186/s13073-021-00932-9.

Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles.利用知情误差模型从高通量重测序数据中准确获取人类微卫星基因型。

Nucleic Acids Res. 2013 Jan 7;41(1):e32. doi: 10.1093/nar/gks981. Epub 2012 Oct 22.

A comparison of software for analysis of rare and common short tandem repeat (STR) variation using human genome sequences from clinical and population-based samples.使用来自临床和基于人群样本的人类基因组序列分析罕见和常见短串联重复（STR）变异的软件比较。

PLoS One. 2024 Apr 1;19(4):e0300545. doi: 10.1371/journal.pone.0300545. eCollection 2024.

Profiling the genome-wide landscape of tandem repeat expansions.全基因组串联重复扩展图谱分析。

Nucleic Acids Res. 2019 Sep 5;47(15):e90. doi: 10.1093/nar/gkz501.

The accuracy, feasibility and challenges of sequencing short tandem repeats using next-generation sequencing platforms.使用下一代测序平台对短串联重复序列进行测序的准确性、可行性及挑战。

PLoS One. 2014 Dec 1;9(12):e113862. doi: 10.1371/journal.pone.0113862. eCollection 2014.

Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.从全基因组测序数据中准确分型短串联重复序列及其应用。

Genome Res. 2015 May;25(5):736-49. doi: 10.1101/gr.185892.114. Epub 2015 Mar 30.

Comparison of NGS panel and Sanger sequencing for genotyping CAG repeats in the AR gene.NGS panel 与 Sanger 测序在 AR 基因 CAG 重复序列基因分型中的比较。

Mol Genet Genomic Med. 2020 Jun;8(6):e1207. doi: 10.1002/mgg3.1207. Epub 2020 Mar 25.

REViewer: haplotype-resolved visualization of read alignments in and around tandem repeats.REViewer：串联重复序列及其附近读取比对的单倍型解析可视化。

Genome Med. 2022 Aug 11;14(1):84. doi: 10.1186/s13073-022-01085-z.

Sequencing and characterizing short tandem repeats in the human genome.对人类基因组中的短串联重复序列进行测序和特征分析。

Nat Rev Genet. 2024 Jul;25(7):460-475. doi: 10.1038/s41576-024-00692-3. Epub 2024 Feb 16.

引用本文的文献

Investigating the Performance of Oxford Nanopore Long-Read Sequencing with Respect to Illumina Microarrays and Short-Read Sequencing.研究牛津纳米孔长读长测序相对于Illumina微阵列和短读长测序的性能。

Int J Mol Sci. 2025 May 8;26(10):4492. doi: 10.3390/ijms26104492.

Optical genome mapping enables accurate testing of large repeat expansions.光学基因组图谱能够对大型重复序列扩增进行准确检测。

Genome Res. 2025 Apr 14;35(4):810-823. doi: 10.1101/gr.279491.124.

Bridging the Gap Between Platforms: Comparing Grape Phylloxera (Fitch) Microsatellite Allele Size and DNA Sequence Variation.弥合平台间的差距：比较葡萄根瘤蚜（菲奇）微卫星等位基因大小与DNA序列变异

Insects. 2025 Feb 19;16(2):230. doi: 10.3390/insects16020230.

Exome sequencing of UK birth cohorts.英国出生队列的外显子组测序。

Wellcome Open Res. 2024 Dec 5;9:390. doi: 10.12688/wellcomeopenres.22697.2. eCollection 2024.

Diversity and consequences of structural variation in the human genome.人类基因组结构变异的多样性及其影响

Nat Rev Genet. 2025 Jan 21. doi: 10.1038/s41576-024-00808-9.

High-fidelity, large-scale targeted profiling of microsatellites.高保真、大规模靶向微卫星分析。

Genome Res. 2024 Aug 20;34(7):1008-1026. doi: 10.1101/gr.278785.123.

Sequencing and characterizing short tandem repeats in the human genome.对人类基因组中的短串联重复序列进行测序和特征分析。

Nat Rev Genet. 2024 Jul;25(7):460-475. doi: 10.1038/s41576-024-00692-3. Epub 2024 Feb 16.

Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities.重新发现精神分裂症中的串联重复变异：挑战与机遇。

Transl Psychiatry. 2023 Dec 20;13(1):402. doi: 10.1038/s41398-023-02689-8.

WarpSTR: determining tandem repeat lengths using raw nanopore signals.WarpSTR：使用原始纳米孔信号确定串联重复序列长度。

Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad388.

Application of short tandem repeat (STR) genotyping in partial hydatidiform mole.短串联重复序列（STR）基因分型在部分性葡萄胎中的应用。

Am J Transl Res. 2023 May 15;15(5):3731-3738. eCollection 2023.

本文引用的文献

A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder.蛋白质串联重复及其与固有无序性的关系的新普查。

Genes (Basel). 2020 Apr 9;11(4):407. doi: 10.3390/genes11040407.

Profiling the genome-wide landscape of tandem repeat expansions.全基因组串联重复扩展图谱分析。

Nucleic Acids Res. 2019 Sep 5;47(15):e90. doi: 10.1093/nar/gkz501.

ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions.ExpansionHunter：一种基于序列图的工具，用于分析短串联重复区域的变异。

Bioinformatics. 2019 Nov 1;35(22):4754-4756. doi: 10.1093/bioinformatics/btz431.

Detecting Expansions of Tandem Repeats in Cohorts Sequenced with Short-Read Sequencing Data.检测短读测序数据序列化队列中的串联重复扩展。

Am J Hum Genet. 2018 Dec 6;103(6):858-873. doi: 10.1016/j.ajhg.2018.10.015. Epub 2018 Nov 29.

Dante: genotyping of known complex and expanded short tandem repeats.对已知的复杂和扩展短串联重复序列进行基因分型。

Bioinformatics. 2019 Apr 15;35(8):1310-1317. doi: 10.1093/bioinformatics/bty791.

STRetch: detecting and discovering pathogenic short tandem repeat expansions.STRetch：检测和发现致病性短串联重复扩展。

Genome Biol. 2018 Aug 21;19(1):121. doi: 10.1186/s13059-018-1505-2.

Tandem repeats mediating genetic plasticity in health and disease.串联重复序列介导健康与疾病中的遗传可塑性。

Nat Rev Genet. 2018 May;19(5):286-298. doi: 10.1038/nrg.2017.115. Epub 2018 Feb 5.

Clinical sequencing: From raw data to diagnosis with lifetime value.临床测序：从原始数据到具有终身价值的诊断。

Clin Genet. 2018 Mar;93(3):508-519. doi: 10.1111/cge.13190.

Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes.12632个人类全基因组中短串联重复疾病等位基因的分析

Am J Hum Genet. 2017 Nov 2;101(5):700-715. doi: 10.1016/j.ajhg.2017.09.013.

STRScan: targeted profiling of short tandem repeats in whole-genome sequencing data.STRScan：全基因组测序数据中短串联重复序列的靶向分析

BMC Bioinformatics. 2017 Oct 3;18(Suppl 11):398. doi: 10.1186/s12859-017-1800-z.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

短串联重复序列基因分型工具在全外显子组测序数据中的准确性。

Accuracy of short tandem repeats genotyping tools in whole exome sequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献