Raytheon BBN, 10 Moulton Street, Cambridge, MA, 02138, USA.
Integrated DNA Technologies, 1710 Commercial Park, Coralville, IA, 52241, USA.
Sci Rep. 2023 Apr 3;13(1):5390. doi: 10.1038/s41598-023-32481-z.
As synthetic biology becomes increasingly capable and accessible, it is likewise increasingly critical to be able to make accurate biosecurity determinations regarding the pathogenicity or toxicity of particular nucleic acid or amino acid sequences. At present, this is typically done using the BLAST algorithm to determine the best match with sequences in the NCBI nucleic acid and protein databases. Neither BLAST nor any of the NCBI databases, however, are actually designed for biosafety determination. Critically, taxonomic errors or ambiguities in the NCBI nucleic acid and protein databases can also cause errors in BLAST-based taxonomic categorization. With heavily studied taxa and frequently used biotechnology tools, even low frequency taxonomic categorization issues can lead to high rates of errors in biosecurity decision-making. Here we focus on the implications for false positives, finding that BLAST against NCBI's protein database will now incorrectly categorize a number of commonly used biotechnology tool sequences as the pathogens or toxins with which they have been used. Paradoxically, this implies that problems are expected to be most acute for the pathogens and toxins of highest interest and for the most widely used biotechnology tools. We thus conclude that biosecurity tools should shift away from BLAST against general purpose databases and towards new methods that are specifically tailored for biosafety purposes.
随着合成生物学的能力和可及性不断提高,能够对特定核酸或氨基酸序列的致病性或毒性做出准确的生物安保判定变得愈发重要。目前,这通常是通过 BLAST 算法来确定与 NCBI 核酸和蛋白质数据库中序列的最佳匹配来完成的。然而,BLAST 算法和任何一个 NCBI 数据库实际上都不是专门为生物安保判定而设计的。关键的是,NCBI 核酸和蛋白质数据库中的分类学错误或模糊性也会导致 BLAST 基于分类的错误。对于研究较多的分类群和经常使用的生物技术工具,即使是低频率的分类学问题也可能导致生物安保决策中的高错误率。在这里,我们重点关注假阳性的影响,发现针对 NCBI 蛋白质数据库的 BLAST 现在会错误地将一些常用的生物技术工具序列归类为与其用途相关的病原体或毒素。矛盾的是,这意味着对于最受关注的病原体和毒素以及最广泛使用的生物技术工具,预计问题会最为严重。因此,我们得出结论,生物安保工具应该从针对通用数据库的 BLAST 转移到专门为生物安全目的而定制的新方法。