Jayasooriya Kavindu, Jenner Sasha P, Marasinghe Pasindu, Senanayake Udith, Saadat Hassaan, Taubman David, Ragel Roshan, Gamaarachchi Hasindu, Deveson Ira W
Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales 2010, Australia.
Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children's Research Institute, Sydney, New South Wales 2010, Australia.
Genome Res. 2025 Jul 1;35(7):1574-1582. doi: 10.1101/gr.280090.124.
Nanopore sequencing is an increasingly central tool for genomics. Despite rapid advances in the field, large data volumes and computational bottlenecks continue to pose major challenges. Here, we introduce ex-zd, a new data compression strategy that helps address the large size of raw signal data generated during nanopore experiments. Ex-zd encompasses both a lossless compression method, which modestly outperforms all current methods for nanopore signal data compression, and a 'lossy' method, which can be used to achieve additional savings. The latter component works by reducing the number of bits used to encode signal data. We show that the three least significant bits in signal data generated on instruments from Oxford Nanopore Technologies (ONT) predominantly encode noise. Their removal reduces file sizes by half without impacting downstream analyses, including basecalling and detection of modified DNA or RNA bases. Ex-zd compression saves hundreds of gigabytes on a single ONT sequencing experiment, thereby increasing the scalability, portability, and accessibility of nanopore sequencing.
纳米孔测序日益成为基因组学的核心工具。尽管该领域取得了快速进展,但大数据量和计算瓶颈仍然构成重大挑战。在此,我们介绍了ex-zd,这是一种新的数据压缩策略,有助于解决纳米孔实验中产生的原始信号数据量庞大的问题。Ex-zd包含一种无损压缩方法,其性能略优于目前所有用于纳米孔信号数据压缩的方法,以及一种“有损”方法,可用于实现进一步的数据节省。后一种方法通过减少用于编码信号数据的位数来实现。我们表明,牛津纳米孔技术公司(ONT)仪器生成的信号数据中,最低有效三位主要编码噪声。去除这三位可使文件大小减半,而不会影响包括碱基识别以及修饰的DNA或RNA碱基检测在内的下游分析。Ex-zd压缩在单次ONT测序实验中可节省数百GB的数据,从而提高了纳米孔测序的可扩展性、便携性和可及性。