Suppr超能文献

TRGT-denovo:串联重复突变的准确检测

TRGT-denovo: accurate detection of tandem repeat mutations.

作者信息

Mokveld T, Dolzhenko E, Dashnow H, Nicholas T J, Sasani T, van der Sanden B, Jadhav B, Pedersen B, Kronenberg Z, Tucci A, Sharp A J, Quinlan A R, Gilissen C, Hoischen A, Eberle M A

机构信息

PacBio, Menlo Park, CA.

Univ. of Utah, Salt Lake City, UT.

出版信息

bioRxiv. 2024 Jul 19:2024.07.16.600745. doi: 10.1101/2024.07.16.600745.

Abstract

MOTIVATION

Identifying tandem repeat (TR) mutations on a genome-wide scale is essential for understanding genetic variability and its implications in rare diseases. While PacBio HiFi sequencing data enhances the accessibility of the genome's TR regions for genotyping, simple calling strategies often generate an excess of likely false positives, which can obscure true positive findings, particularly as the number of surveyed genomic regions increases.

RESULTS

We developed TRGT-denovo, a computational method designed to accurately identify all types of TR mutations-including expansions, contractions, and compositional changes-within family trios. TRGT-denovo directly interrogates read evidence, allowing for the detection of subtle variations often overlooked in variant call format (VCF) files. TRGT-denovo improves the precision and specificity of mutation (DNM) identification, reducing the number of candidates by an order of magnitude compared to genotype-based approaches. In our experiments involving eight rare disease trios previously studiedTRGT-denovo correctly reclassified all false positive DNM candidates as true negatives. Using an expanded repeat catalog, it identified new candidates, of which 95% (19/20) were experimentally validated, demonstrating its effectiveness in minimizing likely false positives while maintaining high sensitivity for true discoveries.

AVAILABILITY AND IMPLEMENTATION

Built in Rust, TRGT-denovo is available as source code and a pre-compiled Linux binary along with a user guide at: https://github.com/PacificBiosciences/trgt-denovo.

摘要

动机

在全基因组范围内识别串联重复(TR)突变对于理解遗传变异性及其在罕见病中的影响至关重要。虽然PacBio HiFi测序数据提高了基因组TR区域用于基因分型的可及性,但简单的调用策略通常会产生过多可能的假阳性结果,这可能会掩盖真正的阳性发现,尤其是随着被调查基因组区域数量的增加。

结果

我们开发了TRGT-denovo,这是一种计算方法,旨在准确识别家系三联体中的所有类型的TR突变,包括扩增、收缩和组成变化。TRGT-denovo直接审查读取证据,从而能够检测变异调用格式(VCF)文件中经常被忽视的细微变异。TRGT-denovo提高了突变(DNM)识别的精度和特异性,与基于基因型的方法相比,将候选数量减少了一个数量级。在我们涉及八个先前研究过的罕见病三联体的实验中,TRGT-denovo将所有假阳性DNM候选者正确地重新分类为真阴性。使用扩展的重复目录,它识别出了新的候选者,其中95%(19/20)经过实验验证,证明了其在最大限度减少可能的假阳性同时保持对真实发现的高灵敏度方面的有效性。

可用性和实现方式

TRGT-denovo用Rust编写,可作为源代码、预编译的Linux二进制文件以及用户指南获取,网址为:https://github.com/PacificBiosciences/trgt-denovo。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9ee/11275785/6ee7865532ea/nihpp-2024.07.16.600745v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验