Mikhaylova Veronika, Rzepka Madison, Kawamura Tetsuya, Xia Yu, Chang Peter L, Zhou Shiguo, Pham Long, Modi Naisarg, Yao Likun, Perez-Agustin Adrian, Pagans Sara, Boles T Christian, Lei Ming, Wang Yong, Garcia-Bassets Ivan, Chen Zhoutao
Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA.
Sage Science Inc., Beverly, MA 01915, USA.
bioRxiv. 2023 Mar 6:2023.03.05.531179. doi: 10.1101/2023.03.05.531179.
In the human genome, heterozygous sites are genomic positions with different alleles inherited from each parent. On average, there is a heterozygous site every 1-2 kilobases (kb). Resolving whether two alleles in neighboring heterozygous positions are physically linked-that is, phased-is possible with a short-read sequencer if the sequencing library captures long-range information. TELL-Seq is a library preparation method based on millions of barcoded micro-sized beads that enables instrument-free phasing of a whole human genome in a single PCR tube. TELL-Seq incorporates a unique molecular identifier (barcode) to the short reads generated from the same high-molecular-weight (HMW) DNA fragment (known as 'linked-reads'). However, genome-scale TELL-Seq is not cost-effective for applications focusing on a single locus or a few loci. Here, we present an optimized TELL-Seq protocol that enables the cost-effective phasing of enriched loci (targets) of varying sizes, purity levels, and heterozygosity. Targeted TELL-Seq maximizes linked-read efficiency and library yield while minimizing input requirements, fragment collisions on microbeads, and sequencing burden. To validate the targeted protocol, we phased seven 180-200 kb loci enriched by CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis, four 20 kb loci enriched by CRISPR/Cas9-mediated protection from exonuclease digestion, and six 2-13 kb loci amplified by PCR. The selected targets have clinical and research relevance (, , , , , , , -, and ). These analyses reveal that targeted TELL-Seq provides a reliable way of phasing allelic variants within targets (2-200 kb in length) with the low cost and high accuracy of short-read sequencing.
在人类基因组中,杂合位点是从每个亲本继承不同等位基因的基因组位置。平均而言,每1 - 2千碱基(kb)就有一个杂合位点。如果测序文库捕获长距离信息,那么使用短读长测序仪就有可能确定相邻杂合位置的两个等位基因是否在物理上相连,即是否定相。TELL - Seq是一种基于数百万个条形码微珠的文库制备方法,可在单个PCR管中对整个人类基因组进行无需仪器的定相。TELL - Seq将独特的分子标识符(条形码)整合到从同一高分子量(HMW)DNA片段产生的短读长中(称为“连接读段”)。然而,对于专注于单个位点或少数位点的应用,全基因组规模的TELL - Seq并不具有成本效益。在这里,我们提出了一种优化的TELL - Seq方案,该方案能够以具有成本效益的方式对大小、纯度水平和杂合度不同的富集位点(靶标)进行定相。靶向TELL - Seq可最大限度地提高连接读段效率和文库产量,同时将输入要求、微珠上的片段碰撞和测序负担降至最低。为了验证靶向方案,我们对通过CRISPR/Cas9介导的切除结合脉冲场电泳富集的7个180 - 200 kb位点、通过CRISPR/Cas9介导的免受核酸外切酶消化保护富集的4个20 kb位点以及通过PCR扩增的6个2 - 13 kb位点进行了定相。所选靶标具有临床和研究相关性(,,,,,,, - ,以及)。这些分析表明,靶向TELL - Seq提供了一种可靠的方法,可通过短读长测序的低成本和高精度对靶标内(长度为2 - 200 kb)的等位基因变体进行定相。