Laird Tyler S, Flyangolts Kevin, Bartling Craig, Gemler Bryan T, Beal Jacob, Mitchell Tom, Murphy Steven T, Berlips Jens, Foner Leonard, Doughty Ryan, Quintana Felix, Nute Michael, Treangen Todd J, Godbold Gene, Ternus Krista, Alexanian Tessa, Wheeler Nicole, Forry Samuel P
NIST, 100 Bureau Dr. Gaithersburg, MD 20899.
Aclid, 442 5th Ave #2300 New York, NY 10018.
bioRxiv. 2025 Jun 1:2025.05.30.655379. doi: 10.1101/2025.05.30.655379.
Nucleic acid synthesis is a dual-use technology that can benefit fields such as biology, medicine, and information storage. However, synthetic nucleic acids could also potentially be used negligently and ultimately cause harm, or be used with malicious intent to cause harm. Thus, this technology needs to be appropriately safeguarded. Sequence screening is one component of a biosecurity protocol for preventing such harm and consists of differentiating Sequences of Concern (SOCs) from benign sequences that are not associated with pathogenicity or toxicity. There exist many fit-for-purpose tools that have been developed for DNA synthesis sequence screening. However, questions remain regarding their performance with respect to consistency of screening. To aid in determining if screening tools are harmonized in regard to baseline sequence screening, NIST constructed a test dataset based on current screening recommendations. NIST then sent blinded datasets to sequence screening tool developers for testing. Overall, there was a general agreement between the tools and NIST assignments of the sequences and all tools had a baseline performance of greater than 95% sensitivity and 97% accuracy. Disagreement on specific sequences largely arose from single tools and could be traced to differences in defining a SOC and/or methodological differences in screening algorithms.
核酸合成是一项具有两用性的技术,可造福生物学、医学和信息存储等领域。然而,合成核酸也可能被疏忽使用并最终造成危害,或者被恶意用于造成伤害。因此,这项技术需要得到妥善保护。序列筛查是生物安全协议中预防此类危害的一个组成部分,包括区分关注序列(SOC)与不具有致病性或毒性的良性序列。已经开发出许多适用于DNA合成序列筛查的工具。然而,关于它们在筛查一致性方面的性能仍存在问题。为了帮助确定筛查工具在基线序列筛查方面是否协调一致,美国国家标准与技术研究院(NIST)根据当前的筛查建议构建了一个测试数据集。然后,NIST将盲态数据集发送给序列筛查工具开发者进行测试。总体而言,各工具与NIST对序列的分类之间基本达成一致,所有工具的基线性能均高于95%的灵敏度和97%的准确率。在特定序列上的分歧主要源于单个工具,并且可以追溯到定义SOC的差异和/或筛查算法的方法学差异。