Suppr超能文献

基于定量分组检测的重叠池测序技术,用于鉴定罕见变异携带者。

Quantitative group testing-based overlapping pool sequencing to identify rare variant carriers.

机构信息

State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.

出版信息

BMC Bioinformatics. 2014 Jun 17;15:195. doi: 10.1186/1471-2105-15-195.

Abstract

BACKGROUND

Genome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases. This highlights the increasing importance of detecting and screening for rare variants. Although the massively parallel sequencing technologies have greatly reduced the cost of DNA sequencing, the identification of rare variant carriers by large-scale re-sequencing remains prohibitively expensive because of the huge challenge of constructing libraries for thousands of samples. Recently, several studies have reported that techniques from group testing theory and compressed sensing could help identify rare variant carriers in large-scale samples with few pooled sequencing experiments and a dramatically reduced cost.

RESULTS

Based on quantitative group testing, we propose an efficient overlapping pool sequencing strategy that allows the efficient recovery of variant carriers in numerous individuals with much lower costs than conventional methods. We used random k-set pool designs to mix samples, and optimized the design parameters according to an indicative probability. Based on a mathematical model of sequencing depth distribution, an optimal threshold was selected to declare a pool positive or negative. Then, using the quantitative information contained in the sequencing results, we designed a heuristic Bayesian probability decoding algorithm to identify variant carriers. Finally, we conducted in silico experiments to find variant carriers among 200 simulated Escherichia coli strains. With the simulated pools and publicly available Illumina sequencing data, our method correctly identified the variant carriers for 91.5-97.9% variants with the variant frequency ranging from 0.5 to 1.5%.

CONCLUSIONS

Using the number of reads, variant carriers could be identified precisely even though samples were randomly selected and pooled. Our method performed better than the published DNA Sudoku design and compressed sequencing, especially in reducing the required data throughput and cost.

摘要

背景

全基因组关联研究表明,稀有变异是导致一些复杂人类疾病遗传的主要原因。这凸显了检测和筛选稀有变异的重要性日益增加。尽管大规模并行测序技术大大降低了 DNA 测序的成本,但由于构建数千个样本文库的巨大挑战,通过大规模重测序来识别稀有变异携带者仍然过于昂贵。最近,有几项研究报告称,群组测试理论和压缩感知技术可以帮助在少量 pooled 测序实验和显著降低成本的情况下,在大规模样本中识别稀有变异携带者。

结果

基于定量群组测试,我们提出了一种高效的重叠池测序策略,该策略允许在比传统方法低得多的成本下,从大量个体中高效地回收变异携带者。我们使用随机 k 集池设计来混合样本,并根据指示概率优化设计参数。基于测序深度分布的数学模型,选择一个最优阈值来宣布池阳性或阴性。然后,使用测序结果中包含的定量信息,我们设计了一个启发式贝叶斯概率解码算法来识别变异携带者。最后,我们进行了模拟实验,在 200 个模拟大肠杆菌菌株中找到了变异携带者。利用模拟池和公开的 Illumina 测序数据,我们的方法正确识别了频率在 0.5 到 1.5%之间的 91.5-97.9%的变异携带者。

结论

即使样本是随机选择和混合的,也可以通过测序读数数量精确识别变异携带者。与已发表的 DNA Sudoku 设计和压缩测序相比,我们的方法表现更好,尤其是在降低所需数据吞吐量和成本方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc42/4229885/3353f9031a34/1471-2105-15-195-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验