Department of Urology, University of Michigan Medical School, Ann Arbor, Michigan, USA.
Department of Medicine, University of Chicago, Chicago, Illinois, USA.
Hum Mol Genet. 2019 Nov 1;28(21):3569-3583. doi: 10.1093/hmg/ddz207.
Integrating single-cell RNA sequencing (scRNA-seq) data with genotypes obtained from DNA sequencing studies facilitates the detection of functional genetic variants underlying cell type-specific gene expression variation. Unfortunately, most existing scRNA-seq studies do not come with DNA sequencing data; thus, being able to call single nucleotide variants (SNVs) from scRNA-seq data alone can provide crucial and complementary information, detection of functional SNVs, maximizing the potential of existing scRNA-seq studies. Here, we perform extensive analyses to evaluate the utility of two SNV calling pipelines (GATK and Monovar), originally designed for SNV calling in either bulk or single-cell DNA sequencing data. In both pipelines, we examined various parameter settings to determine the accuracy of the final SNV call set and provide practical recommendations for applied analysts. We found that combining all reads from the single cells and following GATK Best Practices resulted in the highest number of SNVs identified with a high concordance. In individual single cells, Monovar resulted in better quality SNVs even though none of the pipelines analyzed is capable of calling a reasonable number of SNVs with high accuracy. In addition, we found that SNV calling quality varies across different functional genomic regions. Our results open doors for novel ways to leverage the use of scRNA-seq for the future investigation of SNV function.
将单细胞 RNA 测序 (scRNA-seq) 数据与从 DNA 测序研究中获得的基因型相结合,有助于检测导致细胞类型特异性基因表达变异的功能遗传变异。不幸的是,大多数现有的 scRNA-seq 研究都没有 DNA 测序数据;因此,能够仅从 scRNA-seq 数据中调用单核苷酸变异 (SNV) 可以提供关键且互补的信息,即检测功能 SNV,从而最大限度地发挥现有 scRNA-seq 研究的潜力。在这里,我们进行了广泛的分析,以评估两种 SNV 调用管道 (GATK 和 Monovar) 的实用性,这两种管道最初是为批量或单细胞 DNA 测序数据中的 SNV 调用而设计的。在这两个管道中,我们检查了各种参数设置,以确定最终 SNV 调用集的准确性,并为应用分析师提供实用建议。我们发现,将所有来自单个细胞的读取组合起来,并遵循 GATK 最佳实践,可获得最多数量的 SNV,且具有很高的一致性。在单个单细胞中,即使分析的管道都无法以高精度调用大量合理的 SNV,Monovar 也能产生质量更好的 SNV。此外,我们发现 SNV 调用质量在不同的功能基因组区域之间存在差异。我们的结果为利用 scRNA-seq 进行未来的 SNV 功能研究开辟了新的途径。