Suppr超能文献

通过强化学习改进真菌中的候选生物合成基因簇。

Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning.

机构信息

Departement d'Informatique, UQAM, Montréal, QC H2X 3Y7, Canada.

Centre for Structural and Functional Genomics, Concordia University, Montréal, QC H4B 1R6, Canada.

出版信息

Bioinformatics. 2022 Aug 10;38(16):3984-3991. doi: 10.1093/bioinformatics/btac420.

Abstract

MOTIVATION

Precise identification of Biosynthetic Gene Clusters (BGCs) is a challenging task. Performance of BGC discovery tools is limited by their capacity to accurately predict components belonging to candidate BGCs, often overestimating cluster boundaries. To support optimizing the composition and boundaries of candidate BGCs, we propose reinforcement learning approach relying on protein domains and functional annotations from expert curated BGCs.

RESULTS

The proposed reinforcement learning method aims to improve candidate BGCs obtained with state-of-the-art tools. It was evaluated on candidate BGCs obtained for two fungal genomes, Aspergillus niger and Aspergillus nidulans. The results highlight an improvement of the gene precision by above 15% for TOUCAN, fungiSMASH and DeepBGC; and cluster precision by above 25% for fungiSMASH and DeepBCG, allowing these tools to obtain almost perfect precision in cluster prediction. This can pave the way of optimizing current prediction of candidate BGCs in fungi, while minimizing the curation effort required by domain experts.

AVAILABILITY AND IMPLEMENTATION

https://github.com/bioinfoUQAM/RL-bgc-components.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

精确识别生物合成基因簇 (BGC) 是一项具有挑战性的任务。BGC 发现工具的性能受到其准确预测属于候选 BGC 的成分的能力的限制,通常会高估簇边界。为了支持优化候选 BGC 的组成和边界,我们提出了一种依赖于专家 curated BGC 中的蛋白质结构域和功能注释的强化学习方法。

结果

所提出的强化学习方法旨在改进使用最先进工具获得的候选 BGC。它在两种真菌基因组(黑曲霉和构巢曲霉)的候选 BGC 上进行了评估。结果突出了 TOUCAN、fungiSMASH 和 DeepBGC 的基因精度提高了 15%以上;fungiSMASH 和 DeepBCG 的簇精度提高了 25%以上,使这些工具能够几乎完美地预测簇。这为优化真菌中当前的候选 BGC 预测铺平了道路,同时最大限度地减少了领域专家所需的注释工作。

可用性和实现

https://github.com/bioinfoUQAM/RL-bgc-components。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b7/9364373/2a5c12f16770/btac420f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验