Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China.
Beijing Advanced Innovation Center for Genomics (ICG), Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing 100871, China.
Nucleic Acids Res. 2023 Aug 25;51(15):8020-8034. doi: 10.1093/nar/gkad532.
Although localized haploid phasing can be achieved using long read genome sequencing without parental data, reliable chromosome-scale phasing remains a great challenge. Given that sperm is a natural haploid cell, single-sperm genome sequencing can provide a chromosome-wide phase signal. Due to the limitation of read length, current short-read-based single-sperm genome sequencing methods can only achieve SNP haplotyping and come with difficulties in detecting and haplotyping structural variations (SVs) in complex genomic regions. To overcome these limitations, we developed a long-read-based single-sperm genome sequencing method and a corresponding data analysis pipeline that can accurately identify crossover events and chromosomal level aneuploidies in single sperm and efficiently detect SVs within individual sperm cells. Importantly, without parental genome information, our method can accurately conduct de novo phasing of heterozygous SVs as well as SNPs from male individuals at the whole chromosome scale. The accuracy for phasing of SVs was as high as 98.59% using 100 single sperm cells, and the accuracy for phasing of SNPs was as high as 99.95%. Additionally, our method reliably enabled deduction of the repeat expansions of haplotype-resolved STRs/VNTRs in single sperm cells. Our method provides a new opportunity for studying haplotype-related genetics in mammals.
虽然利用长读长基因组测序在没有亲本数据的情况下可以实现局部单倍型相位,但可靠的染色体级相位仍然是一个巨大的挑战。鉴于精子是一种天然的单倍体细胞,单个精子的基因组测序可以提供染色体范围的相位信号。由于读长的限制,目前基于短读长的单个精子基因组测序方法只能实现 SNP 单倍型分析,并且在检测和单倍型分析复杂基因组区域中的结构变异(SV)方面存在困难。为了克服这些限制,我们开发了一种基于长读长的单个精子基因组测序方法和相应的数据分析管道,该方法可以准确识别单个精子中的交叉事件和染色体水平的非整倍性,并有效地检测单个精子中的 SV。重要的是,在没有亲本基因组信息的情况下,我们的方法可以准确地从头开始对杂合 SV 和来自男性个体的 SNP 进行全染色体级别的单倍型相位分析。使用 100 个单个精子,SV 相位分析的准确性高达 98.59%,SNP 相位分析的准确性高达 99.95%。此外,我们的方法还可靠地推断出单个精子中单体型解析 STR/VNTR 的重复扩展。我们的方法为研究哺乳动物中的单倍型相关遗传学提供了新的机会。