Suppr超能文献

短串联重复序列基因分型工具在全外显子组测序数据中的准确性。

Accuracy of short tandem repeats genotyping tools in whole exome sequencing data.

机构信息

Murdoch Children's Research Institute, Royal Children's Hospital, Parkville, VIC, 3052, Australia.

Peter MacCallum Cancer Centre, 305 Grattan St, Melbourne, VIC, 3000, Australia.

出版信息

F1000Res. 2020 Mar 23;9:200. doi: 10.12688/f1000research.22639.1. eCollection 2020.

Abstract

Short tandem repeats are an important source of genetic variation. They are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington's disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale; however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits, which will aid other researchers in choosing a suitable tool and parameters for analysis. The analysis was performed on the Simons Simplex Collection dataset, where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male samples. In total we analysed 433 samples and around a million genotypes for evaluating tools on whole exome sequencing data. We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length, which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool, while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. All tools have different strengths and weaknesses and the choice may depend on the application. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls.

摘要

短串联重复序列是遗传变异的重要来源。它们高度易变,重复扩展与数十种人类疾病有关,如亨廷顿病和脊髓小脑共济失调。测序技术的技术优势使其能够大规模分析这些重复序列;然而,准确的基因分型仍然是一项具有挑战性的任务。我们比较了四种不同的短串联重复序列基因分型工具在全外显子组测序数据上的表现,以确定它们的基因分型性能和限制,这将有助于其他研究人员选择合适的工具和分析参数。该分析是在西蒙斯单倍型集合数据集上进行的,我们使用了一种新的评估方法,其准确性通过男性样本 X 染色体上纯合子调用的比率来确定。我们总共分析了 433 个样本和大约 100 万个基因型,以评估全外显子组测序数据上的工具。我们确定了所有工具在基因分型 3-6bp 长度的重复序列时具有相对较好的性能,通过覆盖度和质量分数过滤可以提高其性能。然而,所有工具在基因分型同聚体时都具有挑战性,并且在不同的覆盖度和质量分数阈值下都存在高错误率。有趣的是,二核苷酸重复序列也显示出高错误率,这主要是由 AC/TG 重复序列引起的。总体而言,LobSTR 能够做出最多的调用,并且是最快的工具,而 RepeatSeq 和 HipSTR 在低覆盖度下表现出最低的杂合错误率。所有工具都有不同的优缺点,选择可能取决于应用。在本分析中,我们展示了使用不同过滤参数的效果,并根据基因分型最佳准确性和调用数量最高之间的权衡提供了建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed77/7327730/670d3495479c/f1000research-9-24995-g0000.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验