Caswell Jacob, Gans Jason D, Generous Nicholas, Hudson Corey M, Merkley Eric, Johnson Curtis, Oehmen Christopher, Omberg Kristin, Purvine Emilie, Taylor Karen, Ting Christina L, Wolinsky Murray, Xie Gary
Sandia National Laboratories, Albuquerque, NM, United States.
Los Alamos National Laboratory, Bioscience Division, Los Alamos, NM, United States.
Front Bioeng Biotechnol. 2019 Apr 5;7:58. doi: 10.3389/fbioe.2019.00058. eCollection 2019.
Progress in modern biology is being driven, in part, by the large amounts of freely available data in public resources such as the International Nucleotide Sequence Database Collaboration (INSDC), the world's primary database of biological sequence (and related) information. INSDC and similar databases have dramatically increased the pace of fundamental biological discovery and enabled a host of innovative therapeutic, diagnostic, and forensic applications. However, as high-value, openly shared resources with a high degree of assumed trust, these repositories share compelling similarities to the early days of the Internet. Consequently, as public biological databases continue to increase in size and importance, we expect that they will face the same threats as undefended cyberspace. There is a unique opportunity, before a significant breach and loss of trust occurs, to ensure they evolve with quality and security as a design philosophy rather than costly "retrofitted" mitigations. This Perspective surveys some potential quality assurance and security weaknesses in existing open genomic and proteomic repositories, describes methods to mitigate the likelihood of both intentional and unintentional errors, and offers recommendations for risk mitigation based on lessons learned from cybersecurity.
现代生物学的进步在一定程度上受到公共资源中大量免费可用数据的推动,比如国际核苷酸序列数据库协作组织(INSDC),它是世界主要的生物序列(及相关)信息数据库。INSDC和类似数据库极大地加快了基础生物学发现的步伐,并催生了一系列创新的治疗、诊断和法医应用。然而,作为具有高度假定信任的高价值、公开共享资源,这些储存库与互联网早期有着惊人的相似之处。因此,随着公共生物数据库的规模和重要性不断增加,我们预计它们将面临与未设防网络空间相同的威胁。在重大漏洞和信任丧失发生之前,有一个独特的机会来确保它们以质量和安全作为设计理念来发展,而不是通过代价高昂的“事后补救”措施。本观点文章审视了现有开放基因组和蛋白质组储存库中一些潜在的质量保证和安全弱点,描述了降低有意和无意错误可能性的方法,并根据网络安全的经验教训提供了风险缓解建议。