Steiert Tim Alexander, Fuß Janina, Juzenas Simonas, Wittig Michael, Hoeppner Marc Patrick, Vollstedt Melanie, Varkalaite Greta, ElAbd Hesham, Brockmann Christian, Görg Siegfried, Gassner Christoph, Forster Michael, Franke Andre
Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel 24105, Germany.
Institute for Digestive Research, Lithuanian University of Health Sciences, Kaunas 44307, Lithuania.
NAR Genom Bioinform. 2022 Jul 13;4(3):lqac051. doi: 10.1093/nargab/lqac051. eCollection 2022 Sep.
Hybridisation-based targeted enrichment is a widely used and well-established technique in high-throughput second-generation short-read sequencing. Despite the high potential to genetically resolve highly repetitive and variable genomic sequences by, for example PacBio third-generation sequencing, targeted enrichment for long fragments has not yet established the same high-throughput due to currently existing complex workflows and technological dependencies. We here describe a scalable targeted enrichment protocol for fragment sizes of >7 kb. For demonstration purposes we developed a custom blood group panel of challenging loci. Test results achieved > 65% on-target rate, good coverage (142.7×) and sufficient coverage evenness for both non-paralogous and paralogous targets, and sufficient non-duplicate read counts (83.5%) per sample for a highly multiplexed enrichment pool of 16 samples. We genotyped the blood groups of nine patients employing highly accurate phased assemblies at an allelic resolution that match reference blood group allele calls determined by SNP array and NGS genotyping. Seven Genome-in-a-Bottle reference samples achieved high recall (96%) and precision (99%) rates. Mendelian error rates were 0.04% and 0.13% for the included Ashkenazim and Han Chinese trios, respectively. In summary, we provide a protocol and first example for accurate targeted long-read sequencing that can be used in a high-throughput fashion.
基于杂交的靶向富集技术是高通量第二代短读长测序中一种广泛使用且成熟的技术。尽管通过例如PacBio第三代测序在基因解析高度重复和可变的基因组序列方面具有很高的潜力,但由于目前复杂的工作流程和技术依赖性,长片段的靶向富集尚未实现同样的高通量。我们在此描述了一种适用于片段大小大于7 kb的可扩展靶向富集方案。为了演示目的,我们开发了一个包含具有挑战性位点的定制血型面板。测试结果显示,对于非旁系同源和旁系同源靶标,靶向率均超过65%,覆盖度良好(142.7×)且覆盖均匀性充足,对于16个样本的高度多重富集池,每个样本的非重复读取计数充足(83.5%)。我们采用高度准确的相位组装,在等位基因分辨率下对9名患者的血型进行基因分型,其结果与通过SNP阵列和NGS基因分型确定的参考血型等位基因调用相匹配。七个基因组在瓶参考样本实现了高召回率(96%)和高精度(99%)。纳入的阿什肯纳兹和汉族三人组的孟德尔错误率分别为0.04%和0.13%。总之,我们提供了一种可用于高通量方式的准确靶向长读长测序方案及首个示例。