Devonshire Alison S, Morata Jordi, Jubin Claire, Abreu Pereira Rui Pedro, Hernandez-Hernandez Laura, Yener Dilek, Cabannes Eric, McGinn Steven, Delepine Marc, Fund Cédric, Tonda Raúl, Heath Simon, Dabad Marc, Gutierrez-Cuesta Javier, Sanchez Escudero Ignacio, Frias-Lopez Maria Cristina, Cowen Simon, Whale Alexandra, Voss Thorsten, Deleuze Jean-François, Gut Ivo, Gut Marta, Foy Carole A
National Measurement Laboratory (hosted at LGC), The Priestley Centre, 10 Priestley Road, Guildford, Surrey, GU2 7XY, UK.
Centro Nacional de Análisis Genómico (CNAG), Baldiri Reixac 4, Barcelona, 08028, Spain.
BMC Genomics. 2025 Jul 28;26(1):698. doi: 10.1186/s12864-025-11792-7.
Long-read sequencing technologies enable resolution of structural variants (SV) and long-range genome assembly, but require high molecular weight (HMW) DNA of both high quantity and quality to produce optimal sequencing results. New DNA extraction methods have been developed but these have not been assessed for use in routine testing. The interlaboratory study described here tested four commonly used methods: Fire Monkey, Nanobind, Puregene and Genomic-tip with a reference cell line containing known chromosomal alterations. Samples were assessed with commonly applied approaches for evaluating DNA purity and integrity as well as a method based on linkage using digital PCR. Sequencing performance was evaluated and the impact of extraction method on structural variant calling investigated.
All methods generally produced samples of acceptable purity although yield varied considerably between laboratories. Library preparation and sequencing were successful for all four methods, with Fire Monkey extracts achieving the highest N50 values, Genomic Tip giving the highest sequencing yields and Nanobind, the highest proportion of ultra-long reads (> 100 kb). The dPCR assay with duplexes at 100 kb and 150 kb distances was predictive of ultra-long reads and provides a more quantitative read-out (% linkage) than pulse-field gel electrophoresis (PFGE) which varied in performance between instruments and gel dyes. Neither PFGE nor dPCR were predictive of the proportion of short reads (< 10 kb). Coverage was a key factor in the success of SV calling, but this was dependent on SV caller. Megabase scale SVs were challenging to analyse with SV callers and required confirmation based on coverage plots and mapping of junction sequences, and the findings of earlier studies were only partially confirmed.
This study highlights some of the challenges of HMW DNA extraction as well as the need for robust sample QC metrics to ensure optimal sequencing yield and read length which in turn influence the success of SV analysis. dPCR approaches for DNA integrity showed potential but require further development. As long-read methods are increasingly applied in routine settings such as clinical testing laboratories, cellular reference samples with well-characterised SVs are recommended as controls for the full long-read sequencing workflow.
长读长测序技术能够解析结构变异(SV)并进行长距离基因组组装,但需要高质量和高数量的高分子量(HMW)DNA才能产生最佳测序结果。新的DNA提取方法已经开发出来,但尚未在常规检测中进行评估。本文所述的实验室间研究使用了一种含有已知染色体改变的参考细胞系,对四种常用方法进行了测试:Fire Monkey、Nanobind、Puregene和Genomic-tip。采用常用的评估DNA纯度和完整性的方法以及基于数字PCR连锁的方法对样本进行评估。评估了测序性能,并研究了提取方法对结构变异检测的影响。
所有方法通常都能产生纯度可接受的样本,尽管各实验室之间的产量差异很大。所有四种方法的文库制备和测序均成功,Fire Monkey提取物的N50值最高,Genomic Tip的测序产量最高,Nanobind的超长读长(>100 kb)比例最高。在100 kb和150 kb距离处进行双链体的数字PCR分析可预测超长读长,并且比脉冲场凝胶电泳(PFGE)提供更定量的读数(连锁百分比),PFGE在不同仪器和凝胶染料之间的性能有所不同。PFGE和数字PCR均无法预测短读长(<10 kb)的比例。覆盖度是结构变异检测成功的关键因素,但这取决于结构变异检测软件。使用结构变异检测软件分析兆碱基规模的结构变异具有挑战性,需要基于覆盖度图和连接序列映射进行确认,早期研究的结果仅得到部分证实。
本研究突出了高分子量DNA提取的一些挑战,以及需要强大的样本质量控制指标以确保最佳测序产量和读长,这反过来又会影响结构变异分析的成功。用于DNA完整性的数字PCR方法显示出潜力,但需要进一步开发。随着长读长方法越来越多地应用于临床检测实验室等常规环境,建议使用具有特征明确的结构变异的细胞参考样本作为完整长读长测序工作流程的对照。