Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198, USA.
BMC Genomics. 2020 Dec 29;21(Suppl 11):830. doi: 10.1186/s12864-020-07224-3.
Single-cell sequencing enables us to better understand genetic diseases, such as cancer or autoimmune disorders, which are often affected by changes in rare cells. Currently, no existing software is aimed at identifying single nucleotide variations or micro (1-50 bp) insertions and deletions in single-cell RNA sequencing (scRNA-seq) data. Generating high-quality variant data is vital to the study of the aforementioned diseases, among others.
In this study, we report the design and implementation of Red Panda, a novel method to accurately identify variants in scRNA-seq data. Variants were called on scRNA-seq data from human articular chondrocytes, mouse embryonic fibroblasts (MEFs), and simulated data stemming from the MEF alignments. Red Panda had the highest Positive Predictive Value at 45.0%, while other tools-FreeBayes, GATK HaplotypeCaller, GATK UnifiedGenotyper, Monovar, and Platypus-ranged from 5.8-41.53%. From the simulated data, Red Panda had the highest sensitivity at 72.44%.
We show that our method provides a novel and improved mechanism to identify variants in scRNA-seq as compared to currently existing software. However, methods for identification of genomic variants using scRNA-seq data can be still improved.
单细胞测序使我们能够更好地理解遗传疾病,如癌症或自身免疫性疾病,这些疾病通常受到罕见细胞变化的影响。目前,没有专门针对单细胞 RNA 测序(scRNA-seq)数据中单核苷酸变异或微(1-50bp)插入和缺失的现有软件。生成高质量的变异数据对于上述疾病的研究至关重要。
在这项研究中,我们报告了 Red Panda 的设计和实现,这是一种准确识别 scRNA-seq 数据中变体的新方法。在来自人关节软骨细胞、小鼠胚胎成纤维细胞(MEF)和源自 MEF 比对的模拟数据的 scRNA-seq 数据上调用了变体。Red Panda 的阳性预测值最高,为 45.0%,而其他工具-FreeBayes、GATK HaplotypeCaller、GATK UnifiedGenotyper、Monovar 和 Platypus 的阳性预测值在 5.8-41.53%之间。从模拟数据来看,Red Panda 的灵敏度最高,为 72.44%。
与现有的软件相比,我们表明我们的方法为 scRNA-seq 中的变体识别提供了一种新颖且改进的机制。然而,使用 scRNA-seq 数据识别基因组变体的方法仍有待改进。