Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Switchback Road, Glasgow, G61 1QH, UK.
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab007.
Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health.
We present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading.
Since the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19). Widespread adoption has seen HTSlib downloaded >1 million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust, and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT/BSD license.
自 VCF 和 SAM 格式最初发布以来,已经创建了大量软件工具来处理这些数据文件。为了促进这一点,从原始 SAMtools 实现中生成了一个库,重点是性能和鲁棒性。这些文件格式本身已成为全球基因组和健康联盟管辖下的国际标准。
我们提供了一个用于提供对测序对齐和变体格式进行编程访问的软件库。它源自广泛使用的 SAMtools 和 BCFtools 应用程序。对原始代码进行了大量改进,并增加了许多新功能,包括更新的访问协议、添加了 CRAM 文件格式、更好的索引和迭代器,以及更好地利用线程。
自原始 Samtools 发布以来,性能得到了极大的提高,BAM 读写循环的速度提高了 5 倍,BAM 到 SAM 的转换速度提高了 13 倍(均使用 16 个线程,与 Samtools 0.1.19 相比)。广泛采用使得 HTSlib 从 GitHub 下载超过 100 万次,并且在 conda 中也有下载。该 C 库已被估计有 900 个 GitHub 项目直接使用,并已被纳入 Perl、Python、Rust 和 R 中,通过其他语言显著扩大了使用范围。HTSlib 是开源的,可根据 MIT/BSD 许可证从 htslib.org 免费获得。