Santos Renato, Lee Hyunah, Williams Alexander, Baffour-Kyei Anastasia, Lee Sang-Hyuck, Troakes Claire, Al-Chalabi Ammar, Breen Gerome, Iacoangeli Alfredo
Department of Biostatistics & Health Informatics, Institute of Psychiatry Psychology & Neuroscience, King's College London, 16 De Crespigny Park, London SE5 8AB, UK.
Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry Psychology & Neuroscience, King's College London, 16 De Crespigny Park, London SE5 8AB, UK.
Int J Mol Sci. 2025 May 8;26(10):4492. doi: 10.3390/ijms26104492.
Oxford Nanopore Technologies (ONT) long-read sequencing (LRS) has emerged as a promising genomic analysis tool, yet comprehensive benchmarks with established platforms across diverse datasets remain limited. This study aimed to benchmark LRS performance against Illumina short-read sequencing (SRS) and microarrays for variant detection across different genomic contexts and to evaluate the impact of experimental factors. We sequenced 14 human genomes using the three platforms and evaluated single nucleotide variants (SNVs), insertions/deletions (indels), and structural variants (SVs) detection, stratifying by high-complexity, low-complexity, and dark genome regions while assessing effects of multiplexing, depth, and read length. LRS SNV accuracy was slightly lower than that of SRS in high-complexity regions (F-measure: 0.954 vs. 0.967) but showed comparable sensitivity in low-complexity regions. LRS showed robust performance for small (1-5 bp) indels in high-complexity regions (F-measure: 0.869), but SRS agreement decreased significantly in low-complexity regions and for larger indel sizes. Within dark regions, LRS identified more indels than SRS, but showed lower base-level accuracy. LRS identified 2.86 times more SVs than SRS, excelling at detecting large variants (>6 kb), with SV detection improving with sequencing depth. Sequencing depth strongly influenced variant calling performance, whereas multiplexing effects were minimal. Our findings provide valuable insights for optimising LRS applications in genomic research and diagnostics.
牛津纳米孔技术公司(ONT)的长读长测序(LRS)已成为一种很有前景的基因组分析工具,但在不同数据集上与成熟平台进行的全面基准测试仍然有限。本研究旨在针对Illumina短读长测序(SRS)和微阵列,在不同基因组背景下对LRS的变异检测性能进行基准测试,并评估实验因素的影响。我们使用这三种平台对14个人类基因组进行了测序,并评估了单核苷酸变异(SNV)、插入/缺失(indel)和结构变异(SV)的检测情况,按高复杂度、低复杂度和暗基因组区域进行分层,同时评估多重性、深度和读长的影响。在高复杂度区域,LRS的SNV准确性略低于SRS(F值:0.954对0.967),但在低复杂度区域显示出相当的灵敏度。LRS在高复杂度区域对小(1 - 5 bp)indel表现出稳健的性能(F值:0.869),但在低复杂度区域和较大indel大小情况下,SRS的一致性显著下降。在暗区域内,LRS识别出的indel比SRS多,但碱基水平的准确性较低。LRS识别出的SV比SRS多2.86倍,在检测大变异(>6 kb)方面表现出色,SV检测随着测序深度的增加而改善。测序深度强烈影响变异调用性能,而多重性的影响最小。我们的研究结果为优化LRS在基因组研究和诊断中的应用提供了有价值的见解。