Zhang Lu, Zhou Xin, Weng Ziming, Sidow Arend
Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.
Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA.
NAR Genom Bioinform. 2019 Dec 6;2(1):lqz018. doi: 10.1093/nargab/lqz018. eCollection 2020 Mar.
Detection of structural variants (SVs) on the basis of read alignment to a reference genome remains a difficult problem. assembly, traditionally used to generate reference genomes, offers an alternative for SV detection. However, it has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10× linked-read sequencing supports accurate SV detection. We examined variants in six 10× assemblies with diverse experimental parameters from two commonly used human cell lines: NA12878 and NA24385. The assemblies are effective for detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies' contigs to the reference (hg38). Our study also shows that the base-pair level SV breakpoint accuracy is high, with a majority of SVs having precisely correct sizes and breakpoints. Setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation. In about half of cases, the mechanism is the opposite of the reference-based call. We uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that assembly of 10× linked-read data can achieve cost-effective SV detection for personal genomes.
基于与参考基因组的读段比对来检测结构变异(SVs)仍然是一个难题。传统上用于生成参考基因组的组装方法为SV检测提供了一种替代方案。然而,由于短片段方法的根本局限性和长读长技术的高成本,它尚未广泛应用于人类基因组。我们在此表明,10× 连锁读段测序支持准确的SV检测。我们检查了来自两种常用人类细胞系(NA12878和NA24385)的六个具有不同实验参数的10× 组装中的变异。这些组装对于检测中等大小的SVs是有效的,这些SVs是通过将组装的重叠群与参考基因组(hg38)进行简单的成对比对发现的。我们的研究还表明,碱基对水平的SV断点准确性很高,大多数SVs具有精确正确的大小和断点。通过与猿类直系同源物比较来设定SV位点的祖先状态,可以推断导致突变的实际分子机制(插入或缺失)。在大约一半的情况下,该机制与基于参考的推断相反。我们发现了214个自人类与黑猩猩分化之前可能就作为多态性在人类谱系中保留下来的SVs。总体而言,我们表明10× 连锁读段数据的组装可以实现对个人基因组具有成本效益的SV检测。