Senf Alexander, Davies Robert, Haziza Frédéric, Marshall John, Troncoso-Pastoriza Juan, Hofmann Oliver, Keane Thomas M
European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK.
Enthought, Inc., 200 W Cesar Chavez, Suite 202, Austin, TX 78701, USA.
Bioinformatics. 2021 Sep 9;37(17):2753-2754. doi: 10.1093/bioinformatics/btab087.
The majority of genome analysis tools and pipelines require data to be decrypted for access. This potentially leaves sensitive genetic data exposed, either because the unencrypted data is not removed after analysis, or because the data leaves traces on the permanent storage medium.
: We defined a file container specification enabling direct byte-level compatible random access to encrypted genetic data stored in community standards such as SAM/BAM/CRAM/VCF/BCF. By standardizing this format, we show how it can be added as a native file format to genomic libraries, enabling direct analysis of encrypted data without the need to create a decrypted copy.
The Crypt4GH specification can be found at: http://samtools.github.io/hts-specs/crypt4gh.pdf.
Supplementary data are available at Bioinformatics online.
大多数基因组分析工具和流程需要对数据进行解密才能访问。这可能会使敏感的遗传数据暴露,原因要么是分析后未删除未加密的数据,要么是数据在永久存储介质上留下了痕迹。
我们定义了一种文件容器规范,能够对存储在诸如SAM/BAM/CRAM/VCF/BCF等社区标准中的加密遗传数据进行直接的字节级兼容随机访问。通过对这种格式进行标准化,我们展示了如何将其作为一种原生文件格式添加到基因组库中,从而能够直接分析加密数据,而无需创建解密副本。
Crypt4GH规范可在以下网址找到:http://samtools.github.io/hts-specs/crypt4gh.pdf。
补充数据可在《生物信息学》在线版获取。