Suppr超能文献

深度防御:使用深度学习对原核生物的免疫系统进行注释。

Deepdefense: annotation of immune systems in prokaryotes using deep learning.

机构信息

Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg 79110, Germany.

Center for Applied and Translational Genomics (CATG), Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai Healthcare City, Al Razi St. P.O 505055, Dubai, United Arab Emirates.

出版信息

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae062.

Abstract

BACKGROUND

Due to a constant evolutionary arms race, archaea and bacteria have evolved an abundance and diversity of immune responses to protect themselves against phages. Since the discovery and application of CRISPR-Cas adaptive immune systems, numerous novel candidates for immune systems have been identified. Previous approaches to identifying these new immune systems rely on hidden Markov model (HMM)-based homolog searches or use labor-intensive and costly wet-lab experiments. To aid in finding and classifying immune systems genomes, we use machine learning to classify already known immune system proteins and discover potential candidates in the genome. Neural networks have shown promising results in classifying and predicting protein functionality in recent years. However, these methods often operate under the closed-world assumption, where it is presumed that all potential outcomes or classes are already known and included in the training dataset. This assumption does not always hold true in real-world scenarios, such as in genomics, where new samples can emerge that were not previously accounted for in the training phase.

RESULTS

In this work, we explore neural networks for immune protein classification, deal with different methods for rejecting unrelated proteins in a genome-wide search, and establish a benchmark. Then, we optimize our approach for accuracy. Based on this, we develop an algorithm called Deepdefense to predict immune cassette classes based on a genome. This design facilitates the differentiation between immune system-related and unrelated proteins by analyzing variations in model-predicted confidence values, aiding in the identification of both known and potentially novel immune system proteins. Finally, we test our approach for detecting immune systems in the genome against an HMM-based method.

CONCLUSIONS

Deepdefense can automatically detect genes and define cassette annotations and classifications using 2 model classifications. This is achieved by creating an optimized deep learning model to annotate immune systems, in combination with calibration methods, and a second model to enable the scanning of an entire genome.

摘要

背景

由于不断的进化军备竞赛,古菌和细菌进化出了大量多样的免疫反应来保护自己免受噬菌体的侵害。自从 CRISPR-Cas 适应性免疫系统被发现和应用以来,已经发现了许多新的免疫候选系统。以前识别这些新的免疫系统的方法依赖于基于隐马尔可夫模型 (HMM) 的同源搜索,或者使用劳动密集型和昂贵的湿实验室实验。为了帮助寻找和分类免疫系统基因组,我们使用机器学习对已知的免疫系统蛋白进行分类,并在基因组中发现潜在的候选者。神经网络在近年来对蛋白质功能的分类和预测方面取得了有希望的结果。然而,这些方法通常在封闭世界假设下运行,即在该假设下,假定所有潜在的结果或类别都已经已知并包含在训练数据集中。在现实世界的场景中,这种假设并不总是成立的,例如在基因组学中,可能会出现新的样本,而这些样本在训练阶段并没有被考虑到。

结果

在这项工作中,我们探索了神经网络在免疫蛋白分类中的应用,处理了在全基因组搜索中拒绝不相关蛋白质的不同方法,并建立了一个基准。然后,我们优化了我们的方法以提高准确性。在此基础上,我们开发了一种名为 Deepdefense 的算法,用于根据基因组预测免疫盒类。该设计通过分析模型预测置信值的变化来区分与免疫系统相关和不相关的蛋白质,有助于识别已知和潜在的新免疫系统蛋白质。最后,我们测试了我们的方法在检测基因组中的免疫系统方面与基于 HMM 的方法相比的效果。

结论

Deepdefense 可以使用 2 种模型分类自动检测基因并定义盒式注释和分类。这是通过创建一个优化的深度学习模型来注释免疫系统来实现的,同时结合了校准方法,以及第二个模型来扫描整个基因组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d958/11959188/4895d606bcc8/giae062fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验