Cardiovascular Research Institute, University of California-San Francisco, San Francisco, CA, 94143, USA.
School of Biomedical Engineering, Drexel University, Philadelphia, PA, 19104, USA.
Nat Commun. 2019 Mar 4;10(1):1025. doi: 10.1038/s41467-019-08992-7.
Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome.
人类基因组中的大型结构变异(SVs)很难通过传统的测序技术来检测和研究。利用长距离基因组分析平台,如光学图谱,人们可以在一次实验中识别整个基因组中的大型 SVs(>2kb)。通过对 1000 基因组计划中测序的 26 个人群的 154 个人的光学基因组图谱进行分析,我们发现大型 SVs 的系统发育群体模式与人类基因组中 86%的单核苷酸变异相似,而约 2%的基因组具有高度的结构复杂性。我们能够对基因组中许多难以处理的区域的 SVs 进行特征描述,包括片段重复和亚端粒、着丝粒周围和近端着丝粒区域。此外,我们发现参考基因组序列组装中缺失了约 60Mb 的非冗余基因组内容。我们的研究结果强调了需要来自不同人群的全面的替代单倍型数据集来代表基因组中的 SV 模式。