Suppr超能文献

从 bulk 和单细胞 ATAC-seq 中发现单核苷酸变体和插入缺失。

Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq.

机构信息

Bioinformatics and Systems Biology Graduate Program, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.

Integrative Biology Laboratory, Salk Institute for Biological Studies, 10010 N. Torrey Pines Road, La Jolla, CA 92037, USA.

出版信息

Nucleic Acids Res. 2021 Aug 20;49(14):7986-7994. doi: 10.1093/nar/gkab621.

Abstract

Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost 'capture' method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.

摘要

基因组调控区的遗传变异和新生突变通常通过全基因组测序(WGS)来发现,然而 WGS 成本高昂,且大多数 WGS 读段来自非调控区。转座酶可及染色质分析(ATAC-seq)可从调控序列中生成读段,并且可能被用作低成本的调控变异发现“捕获”方法,但尚未对其用于该目的的效果进行系统评估。在这里,我们将七种变异调用器应用于批量和单细胞 ATAC-seq 数据,并评估它们识别单核苷酸变异(SNV)和插入/缺失(indel)的能力。此外,我们开发了一种集成分类器 VarCA,它结合了各个变异调用器的特征,以预测变异。基因组分析工具包(GATK)是表现最好的个体调用器,在批量 ATAC 测试数据集上的 SNV 精度/召回率为 0.92/0.97,在 ATAC-seq 峰区域内至少有 10 个读段的 indel 精度/召回率为 0.87/0.82。在批量 ATAC-seq 读段上,VarCA 的 SNV 精度/召回率为 0.99/0.95,indel 精度/召回率为 0.93/0.80,表现优于其他方法。在单细胞 ATAC-seq 读段上,VarCA 的 SNV 精度/召回率为 0.98/0.94,indel 精度/召回率为 0.82/0.82。总之,在没有全基因组测序数据的情况下,ATAC-seq 读段可用于准确发现非编码调控变异,并且我们的集成方法 VarCA 具有最佳的整体性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ab3/8373110/a083c2fa34cb/gkab621fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验