Biostatistics and Bioinformatics Staff, Office of Analytics and Outreach, Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, Maryland, USA.
Signals Team, Coordinated Outbreak Response and Evaluation Network, Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, Maryland, USA.
Clin Infect Dis. 2021 Oct 20;73(8):1537-1539. doi: 10.1093/cid/ciab615.
Open-source DNA sequence databases have long been touted as beneficial to public health, including the facilitation of earlier detection and response to infectious disease outbreaks. Of critical importance to harnessing these benefits is the metadata that describe general and other domain-specific attributes (eg, collection location, isolate type) of a sample. Unlike the sequence data, metadata are often incomplete and lack adherence to an international standard. Here, we describe the problem posed by such variable and incomplete metadata in terms of interpretative labor costs (the time and energy necessary to make sense of the signal in the genetic data) and the impact such metadata have on foodborne outbreak detection and response. Improving the quality of sequence-associated metadata would allow for earlier detection of emerging food safety hazards and allow faster response to foodborne outbreaks.
开源 DNA 序列数据库长期以来一直被吹捧为对公共卫生有益,包括促进对传染病爆发的早期检测和响应。利用这些好处的关键是描述样本的一般和其他特定领域属性(例如,收集位置、分离物类型)的元数据。与序列数据不同,元数据通常不完整且缺乏对国际标准的遵守。在这里,我们根据解释性劳动成本(理解遗传数据中信号所需的时间和精力)以及此类元数据对食源性疾病爆发检测和响应的影响来描述这种可变和不完整的元数据所带来的问题。提高与序列相关的元数据的质量将有助于更早地发现新出现的食品安全隐患,并更快地应对食源性疾病爆发。