Suppr超能文献

HapCNV:一种用于低输入量DNA测序数据中拷贝数变异检测的综合框架。

HapCNV: A Comprehensive Framework for CNV Detection in Low-input DNA Sequencing Data.

作者信息

Yu Xuanxuan, Qin Fei, Liu Shiwei, Brown Noah J, Lu Qing, Cai Guoshuai, Guler Jennifer L, Xiao Feifei

机构信息

Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA.

Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850, USA.

出版信息

bioRxiv. 2025 Jan 7:2024.12.19.629494. doi: 10.1101/2024.12.19.629494.

Abstract

Copy number variants (CNVs) are prevalent in both diploid and haploid genomes, with the latter containing a single copy of each gene. Studying CNVs in genomes from single or few cells is significantly advancing our knowledge in human disorders and disease susceptibility. Low-input including low-cell and single-cell sequencing data for haploid and diploid organisms generally displays shallow and highly non-uniform read counts resulting from the whole genome amplification steps that introduce amplification biases. In addition, haploid organisms typically possess relatively short genomes and require a higher degree of DNA amplification compared to diploid organisms. However, most CNV detection methods are specifically developed for diploid genomes without specific consideration of effects on haploid genomes. Challenges also reside in reference samples or normal controls which are used to provide baseline signals for defining copy number losses or gains. In traditional methods, references are usually pre-specified from cells that are assumed to be normal or disease-free. However, the use of pre-defined reference cells can bias results if common CNVs are present. Here, we present the development of a comprehensive statistical framework for data normalization and CNV detection in haploid single- or low-cell DNA sequencing data called HapCNV. The prominent advancement is the construction of a novel genomic location specific pseudo-reference that selects unbiased references using a preliminary cell clustering method. This approach effectively preserves common CNVs. Using simulations, we demonstrated that HapCNV outperformed existing methods by generating more accurate CNV detection, especially for short CNVs. Superior performance of HapCNV was also validated in detecting known CNVs in a real parasite dataset. In conclusion, HapCNV provides a novel and useful approach for CNV detection in haploid low-input sequencing datasets, with easy applicability to diploids.

摘要

拷贝数变异(CNV)在二倍体和单倍体基因组中都普遍存在,后者每个基因仅含一个拷贝。研究单细胞或少数细胞基因组中的CNV正在显著推进我们对人类疾病和疾病易感性的认识。对于单倍体和二倍体生物,包括低细胞和单细胞测序数据在内的低起始量数据通常显示出因全基因组扩增步骤引入扩增偏差而导致的reads计数浅且高度不均匀。此外,与二倍体生物相比,单倍体生物的基因组通常相对较短,需要更高程度的DNA扩增。然而,大多数CNV检测方法是专门为二倍体基因组开发的,没有特别考虑对单倍体基因组的影响。挑战还存在于用于为定义拷贝数缺失或增加提供基线信号的参考样本或正常对照中。在传统方法中,参考通常预先从假定为正常或无疾病的细胞中指定。然而,如果存在常见的CNV,使用预先定义的参考细胞可能会使结果产生偏差。在此,我们提出了一种名为HapCNV的用于单倍体单细胞或低细胞DNA测序数据的数据标准化和CNV检测的综合统计框架。显著的进展是构建了一种新的基因组位置特异性伪参考,它使用初步细胞聚类方法选择无偏参考。这种方法有效地保留了常见的CNV。通过模拟,我们证明HapCNV通过生成更准确的CNV检测结果优于现有方法,特别是对于短CNV。HapCNV在真实寄生虫数据集检测已知CNV中的卓越性能也得到了验证。总之,HapCNV为单倍体低起始量测序数据集中的CNV检测提供了一种新颖且有用的方法,并且易于应用于二倍体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/11727457/bcb7a930570a/nihpp-2024.12.19.629494v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验