Bayer Crop Science, 700 Chesterfield Pkwy West, Chesterfield, MO 63017, United States.
Bayer Crop Science, 700 Chesterfield Pkwy West, Chesterfield, MO 63017, United States.
J Invertebr Pathol. 2021 Nov;186:107587. doi: 10.1016/j.jip.2021.107587. Epub 2021 Apr 8.
Bioinformatic analyses of protein sequences play an important role in the discovery and subsequent safety assessment of insect control proteins in Genetically Modified (GM) crops. Due to the rapid adoption of high-throughput sequencing methods over the last decade, the number of protein sequences in GenBank and other public databases has increased dramatically. Many of these protein sequences are the product of whole genome sequencing efforts, coupled with automated protein sequence prediction and annotation pipelines. Published genome sequencing studies provide a rich and expanding foundation of new source organisms and proteins for insect control or other desirable traits in GM products. However, data generated by automated pipelines can also confound regulatory safety assessments that employ bioinformatics. Largely this issue does not arise due to underlying sequence, but rather its annotation or associated metadata, and the downstream integration of that data into existing repositories. Observations made during bioinformatic safety assessments are described.
对蛋白质序列进行生物信息学分析在发现和随后评估转基因作物中的昆虫控制蛋白的安全性方面发挥着重要作用。由于过去十年高通量测序方法的快速采用,GenBank 和其他公共数据库中的蛋白质序列数量急剧增加。这些蛋白质序列中的许多是全基因组测序工作的产物,同时结合了自动化蛋白质序列预测和注释管道。已发表的基因组测序研究为昆虫控制或转基因产品中的其他理想特性提供了丰富且不断扩展的新来源生物和蛋白质基础。然而,自动化管道生成的数据也可能使采用生物信息学的监管安全评估变得复杂。在很大程度上,这个问题不是由于潜在的序列引起的,而是由于其注释或相关元数据,以及将该数据下游整合到现有存储库中。描述了在生物信息学安全评估中观察到的情况。