Booeshaghi A Sina, Chen Xi, Pachter Lior
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California.
School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.
bioRxiv. 2023 Jul 18:2023.03.17.533215. doi: 10.1101/2023.03.17.533215.
Understanding the structure of sequenced fragments from genomics libraries is essential for accurate read preprocessing. Currently, different assays and sequencing technologies require custom scripts and programs that do not leverage the common structure of sequence elements present in genomics libraries. We present a machine-readable specification for libraries produced by genomics assays that facilitates standardization of preprocessing and enables tracking and comparison of genomics assays. The specification and associated command line tool is available at https://github.com/IGVF/seqspec.
了解基因组文库中测序片段的结构对于准确的读段预处理至关重要。目前,不同的检测方法和测序技术需要定制脚本和程序,这些脚本和程序无法利用基因组文库中序列元件的共同结构。我们提出了一种针对基因组检测产生的文库的机器可读规范,该规范有助于预处理的标准化,并能够对基因组检测进行跟踪和比较。该规范及相关命令行工具可在https://github.com/IGVF/seqspec获取。