Suppr超能文献

对已知的复杂和扩展短串联重复序列进行基因分型。

Dante: genotyping of known complex and expanded short tandem repeats.

机构信息

Department of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, Slovakia.

Geneton Ltd., Bratislava, Slovakia.

出版信息

Bioinformatics. 2019 Apr 15;35(8):1310-1317. doi: 10.1093/bioinformatics/bty791.

Abstract

MOTIVATION

Short tandem repeats (STRs) are stretches of repetitive DNA in which short sequences, typically made of 2-6 nucleotides, are repeated several times. Since STRs have many important biological roles and also belong to the most polymorphic parts of the human genome, they became utilized in several molecular-genetic applications. Precise genotyping of STR alleles, therefore, was of high relevance during the last decades. Despite this, massively parallel sequencing (MPS) still lacks the analysis methods to fully utilize the information value of STRs in genome scale assays.

RESULTS

We propose an alignment-free algorithm, called Dante, for genotyping and characterization of STR alleles at user-specified known loci based on sequence reads originating from STR loci of interest. The method accounts for natural deviations from the expected sequence, such as variation in the repeat count, sequencing errors, ambiguous bases and complex loci containing several different motifs. In addition, we implemented a correction for copy number defects caused by the polymerase induced stutter effect as well as a prediction of STR expansions that, according to the conventional view, cannot be fully captured by inherently short MPS reads. We tested Dante on simulated datasets and on datasets obtained by targeted sequencing of protein coding parts of thousands of selected clinically relevant genes. In both these datasets, Dante outperformed HipSTR and GATK genotyping tools. Furthermore, Dante was able to predict allele expansions in all tested clinical cases.

AVAILABILITY AND IMPLEMENTATION

Dante is open source software, freely available for download at https://github.com/jbudis/dante.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

短串联重复序列(STRs)是一种重复 DNA 片段,其中短序列(通常由 2-6 个核苷酸组成)重复多次。由于 STRs 具有许多重要的生物学作用,并且属于人类基因组中最具多态性的部分,因此它们在几种分子遗传应用中得到了利用。因此,STR 等位基因的精确基因分型在过去几十年中具有重要意义。尽管如此,大规模并行测序(MPS)仍然缺乏分析方法,无法充分利用 STR 在基因组规模检测中的信息价值。

结果

我们提出了一种基于比对的算法,称为 Dante,用于在用户指定的已知基因座上对 STR 等位基因进行基因分型和特征分析,该算法基于来自感兴趣的 STR 基因座的序列读取。该方法考虑了自然偏离预期序列的情况,例如重复计数的变化、测序错误、模糊碱基和包含几个不同基序的复杂基因座。此外,我们还实现了对聚合酶诱导的突跳效应引起的拷贝数缺陷的校正,以及对 STR 扩展的预测,根据传统观点,这种扩展不能完全被固有的短 MPS 读取捕获。我们在模拟数据集和通过靶向测序数千个选定的临床相关基因的蛋白质编码部分获得的数据集上测试了 Dante。在这两个数据集上,Dante 的性能均优于 HipSTR 和 GATK 基因分型工具。此外,Dante 能够预测所有测试临床病例中的等位基因扩展。

可用性和实现

Dante 是一个开源软件,可在 https://github.com/jbudis/dante 上免费下载。

补充信息

补充数据可在 Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验