Suppr超能文献

全基因组范围内短串联重复序列的体细胞镶嵌性检测。

Genome-wide detection of somatic mosaicism at short tandem repeats.

机构信息

Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, United States.

Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, United States.

出版信息

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae485.

Abstract

MOTIVATION

Somatic mosaicism has been implicated in several developmental disorders, cancers, and other diseases. Short tandem repeats (STRs) consist of repeated sequences of 1-6 bp and comprise >1 million loci in the human genome. Somatic mosaicism at STRs is known to play a key role in the pathogenicity of loci implicated in repeat expansion disorders and is highly prevalent in cancers exhibiting microsatellite instability. While a variety of tools have been developed to genotype germline variation at STRs, a method for systematically identifying mosaic STRs is lacking.

RESULTS

We introduce prancSTR, a novel method for detecting mosaic STRs from individual high-throughput sequencing datasets. prancSTR is designed to detect loci characterized by a single high-frequency mosaic allele, but can also detect loci with multiple mosaic alleles. Unlike many existing mosaicism detection methods for other variant types, prancSTR does not require a matched control sample as input. We show that prancSTR accurately identifies mosaic STRs in simulated data, demonstrate its feasibility by identifying candidate mosaic STRs in Illumina whole genome sequencing data derived from lymphoblastoid cell lines for individuals sequenced by the 1000 Genomes Project, and evaluate the use of prancSTR on Element and PacBio data. In addition to prancSTR, we present simTR, a novel simulation framework which simulates raw sequencing reads with realistic error profiles at STRs.

AVAILABILITY AND IMPLEMENTATION

prancSTR and simTR are freely available at https://github.com/gymrek-lab/trtools. Detailed documentation is available at https://trtools.readthedocs.io/.

摘要

动机

体细胞镶嵌现象与多种发育障碍、癌症和其他疾病有关。短串联重复序列(STRs)由 1-6 个碱基的重复序列组成,在人类基因组中包含>100 万个位点。已知 STRs 的体细胞镶嵌现象在重复扩展障碍相关位点的致病性中起关键作用,并且在表现出微卫星不稳定的癌症中高度普遍存在。虽然已经开发了多种工具来对 STR 中的种系变异进行基因分型,但缺乏系统识别镶嵌 STR 的方法。

结果

我们引入了 prancSTR,这是一种从单个高通量测序数据集中检测镶嵌 STR 的新方法。prancSTR 旨在检测具有单个高频镶嵌等位基因的位点,但也可以检测具有多个镶嵌等位基因的位点。与许多用于其他变异类型的现有镶嵌性检测方法不同,prancSTR 不需要输入匹配的对照样本。我们表明 prancSTR 可以准确地在模拟数据中识别镶嵌 STR,通过在 1000 基因组计划个体的 Illumina 全基因组测序数据中识别候选镶嵌 STR 来证明其可行性,并评估其在 Element 和 PacBio 数据上的应用。除了 prancSTR,我们还提出了 simTR,这是一种新的模拟框架,它可以在 STR 上模拟具有真实错误分布的原始测序读数。

可用性和实现

prancSTR 和 simTR 可在 https://github.com/gymrek-lab/trtools 上免费获得。详细文档可在 https://trtools.readthedocs.io/ 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2687/11319640/8e210e61fb16/btae485f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验