Zhou Weichen, Mumm Camille, Gan Yanming, Switzenberg Jessica A, Wang Jinhao, De Oliveira Paulo, Kathuria Kunal, Losh Steven J, McDonald Torrin L, Bessell Brandt, Van Deynze Kinsey, McConnell Michael J, Boyle Alan P, Mills Ryan E
Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
bioRxiv. 2024 Dec 21:2024.12.18.629274. doi: 10.1101/2024.12.18.629274.
Somatic mutations in individual cells lead to genomic mosaicism, contributing to the intricate regulatory landscape of genetic disorders and cancers. To evaluate and refine the detection of somatic mosaicism across different technologies with personalized donor-specific assembly (DSA), we obtained tissue from the dorsolateral prefrontal cortex (DLPFC) of a post-mortem neurotypical 31-year-old individual. We sequenced bulk DLPFC tissue using Oxford Nanopore Technologies (60X), NovaSeq (30X), and linked-read sequencing (~28X). Additionally, we applied Cas9 capture methodology coupled with long-read sequencing (TEnCATS), targeting active transposable elements. We also isolated and amplified DNA from flow-sorted single DLPFC neurons using MALBAC, sequencing 115 of these MALBAC libraries on Nanopore and 94 on NovaSeq. We constructed a haplotype-resolved assembly with a total length of 5.77 Gb and a phase block length of 2.67 Mb (N50) to facilitate cross-platform analysis of somatic genetic variations. We observed an increase in the phasing rate from 11.6% to 38.0% between short-read and long-read technologies. By generating a catalog of phased germline SNVs, CNVs, and TEs from the assembled genome, we applied standard approaches to recall these variants across sequencing technologies. We achieved aggregated recall rates from 97.3% to 99.4% based on long-read bulk tissue data, setting an upper bound for detection limits. Moreover, utilizing haplotype-based analysis from DSA, we achieved a remarkable reduction in false positive somatic calls in bulk tissue, ranging from 14.9% to 72.4%. We developed pipelines leveraging DSA information to enhance somatic large genetic variant calling in long-read single cells. By examining somatic variation using long-reads in 115 individual neurons, we identified 468 candidate somatic heterozygous large deletions (1.5Mb - 20Mb), 137 of which intersected with short-read single-cell data. Additionally, we identified 61 putative somatic TEs (60 s, one LINE-1) in the single-cell data. Collectively, our analysis spans personalized assembly to single-cell somatic variant calling, providing a comprehensive approach and resource in real human tissue.
单个细胞中的体细胞突变会导致基因组镶嵌现象,这对遗传疾病和癌症复杂的调控格局有影响。为了通过个性化供体特异性组装(DSA)评估和优化不同技术对体细胞镶嵌现象的检测,我们从一名31岁的死后神经正常个体的背外侧前额叶皮层(DLPFC)获取了组织。我们使用牛津纳米孔技术(约60X)、NovaSeq(约30X)和连接读取测序(约28X)对DLPFC组织样本进行了测序。此外,我们应用了与长读长测序相结合的Cas9捕获方法(TEnCATS),靶向活跃的转座元件。我们还使用多重退火和成环循环扩增技术(MALBAC)从流式分选的单个DLPFC神经元中分离并扩增DNA,在纳米孔上对其中115个MALBAC文库进行测序,在NovaSeq上对94个进行测序。我们构建了一个单倍型解析组装体,总长度为5.77 Gb,相位块长度为2.67 Mb(N50),以促进对体细胞遗传变异的跨平台分析。我们观察到短读长技术和长读长技术之间的定相率从11.6%提高到了38.0%。通过从组装的基因组中生成一个定相的种系单核苷酸变异(SNV)、拷贝数变异(CNV)和转座元件(TE)目录,我们应用标准方法在各种测序技术中召回这些变异。基于长读长组织样本数据,我们实现了97.3%至99.4%的综合召回率,设定了检测限的上限。此外,利用基于单倍型的DSA分析,我们在组织样本中显著降低了体细胞假阳性检测率,范围从14.9%到72.4%。我们开发了利用DSA信息的流程,以增强长读长单细胞中体细胞大遗传变异的检测。通过在115个单个神经元中使用长读长检查体细胞变异,我们鉴定出468个候选体细胞杂合大缺失(1.5Mb - 20Mb),其中137个与短读长单细胞数据相交。此外,我们在单细胞数据中鉴定出61个推定的体细胞TE(60个SINE,1个LINE-1)。总的来说,我们的分析涵盖了从个性化组装到单细胞体细胞变异检测,在真实人类组织中提供了一种全面的方法和资源。