Department of Computer Science, Rice University, Houston, TX 77005, USA.
CINBIO, Universidade de Vigo, Vigo 36310, Spain.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i195-i202. doi: 10.1093/bioinformatics/btac254.
Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCIΦ and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data.
Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCIΦ in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases.
Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar.
单核苷酸变异(SNV)是人类基因组中最常见的变异。最近开发的从单细胞 DNA 测序数据中检测 SNV 的方法,如 SCIΦ 和 scVILP,利用了细胞的进化历史来克服与单细胞测序方案相关的技术误差。尽管这些方法准确,但它们不适用于单细胞全基因组(scWGS)和全外显子组测序(scWES)数据的广泛基因组。
在这里,我们报告了一种新的可扩展方法 Phylovar,它将基于系统发育的变异调用方法扩展到包含数百万个基因座的测序数据集。通过在不同设置下对模拟数据集进行基准测试,我们表明 Phylovar 在运行时间方面优于 SCIΦ,而在 SNV 检测方面优于 Monovar(不了解系统发育)。此外,我们将 Phylovar 应用于两个真实的生物学数据集:一个包含 32 个细胞和 3375 个基因座的 scWES 三阴性乳腺癌数据,以及一个来自正常人类大脑的神经元细胞的 scWGS 数据,其中包含 16 个细胞和约 250 万个基因座。对于癌症数据,Phylovar 检测到了体细胞 SNVs,这些 SNVs 具有高或中度功能影响,也得到了批量测序数据集的支持,对于神经元数据集,Phylovar 鉴定出了 5745 个非同义效应的 SNVs,其中一些与神经退行性疾病有关。
Phylovar 是用 Python 实现的,可在 https://github.com/NakhlehLab/Phylovar 上公开获得。