10x Genomics, Pleasanton, California 94566, USA.
The Institute of Cancer Research, Division of Genetics and Epidemiology, London SM2 5NG, United Kingdom.
Genome Res. 2019 Apr;29(4):635-645. doi: 10.1101/gr.234443.118. Epub 2019 Mar 20.
Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from ∼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as "Linked-Reads". This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes , , and Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
大规模的人群分析结合技术的进步表明,人类基因组比最初认为的更加多样化。迄今为止,这种多样性主要是通过短读长全基因组测序来揭示的。然而,这些短读方法无法全面描绘基因组。它们难以识别结构事件,无法访问重复区域,也无法将人类基因组解析为单倍型。在这里,我们描述了一种在保留长程信息的同时保持短读优势的方法。我们从约 1 纳克高分子量 DNA 开始,生成带条码的短读文库。新颖的信息学方法允许将带条码的短读与它们的原始长分子相关联,从而产生一种新的数据类型,称为“连接读”。这种方法允许从单个文库中同时检测小变体和大变体。在本文中,我们展示了连接读相对于标准短读方法在基于参考的分析中的优势。连接读允许映射到短读无法访问的 38 Mb 序列,增加了包括疾病相关基因在内的 423 个难以测序的基因中的序列 , , 。连接读全基因组和全外显子组测序都可以识别复杂的结构变异,包括平衡事件和单外显子缺失和重复。此外,连接读将高可信度调用区域扩展了 68.9 Mb。这里呈现的数据表明,连接读为使用短读无法实现的全面基因组分析提供了一种可扩展的方法。