Signature Science, LLC, Charlottesville, Virginia, USA.
Signature Science, LLC, Austin, Texas, USA.
Infect Immun. 2022 May 19;90(5):e0033421. doi: 10.1128/IAI.00334-21. Epub 2021 Nov 15.
To identify sequences with a role in microbial pathogenesis, we assessed the adequacy of their annotation by existing controlled vocabularies and sequence databases. Our goal was to regularize descriptions of microbial pathogenesis for improved integration with bioinformatic applications. Here, we review the challenges of annotating sequences for pathogenic activity. We relate the categorization of more than 2,750 sequences of pathogenic microbes through a controlled vocabulary called Functions of Sequences of Concern (FunSoCs). These allow for an ease of description by both humans and machines. We provide a subset of 220 fully annotated sequences in the supplemental material as examples. The use of this compact (∼30 terms), controlled vocabulary has potential benefits for research in microbial genomics, public health, biosecurity, biosurveillance, and the characterization of new and emerging pathogens.
为了确定在微生物发病机制中起作用的序列,我们评估了现有控制词汇和序列数据库对它们注释的充分性。我们的目标是规范微生物发病机制的描述,以便与生物信息学应用更好地集成。在这里,我们回顾了为发病活性注释序列所面临的挑战。我们通过一个名为“关注序列的功能(Functions of Sequences of Concern,FunSoCs)”的控制词汇对 2750 多个致病微生物序列进行了分类,这使得人类和机器都可以轻松地进行描述。我们在补充材料中提供了 220 个完全注释序列的子集作为示例。这种简洁(约 30 个术语)的控制词汇的使用对于微生物基因组学、公共卫生、生物安保、生物监测以及新出现和新兴病原体的特征描述等领域的研究具有潜在的益处。