Suppr超能文献

基因组组装中强化学习的应用:Q学习组装器的深入分析

Using reinforcement learning in genome assembly: in-depth analysis of a Q-learning assembler.

作者信息

Padovani Kleber, Borges Rafael Cabral, Xavier Roberto, Carvalho André Carlos, Reali Anna, Chateau Annie, Alves Ronnie

机构信息

Center for Higher Studies of Itacoatiara, University of the State of Amazonas, Itacoatiara, Amazonas, Brazil.

Data Science, Vale Institute of Technology, Belém, Pará, Brazil.

出版信息

Front Bioinform. 2025 Aug 20;5:1633623. doi: 10.3389/fbinf.2025.1633623. eCollection 2025.

Abstract

Genome assembly remains an unsolved problem, and de novo strategies (i.e., those run without a reference) are relevant but computationally complex tasks in genomics. Although de novo assemblers have been previously successfully applied in genomic projects, there is still no "best assembler", and the choice and setup of assemblers still rely on bioinformatics experts. Thus, as with other computationally complex problems, machine learning has emerged as an alternative (or complementary) way to develop accurate, fast and autonomous assemblers. Reinforcement learning has proven promising for solving complex activities without supervision, such as games, and there is a pressing need to understand the limits of this approach to "real-life" problems, such as the DNA fragment assembly problem. In this study, we analyze the boundaries of applying machine learning via reinforcement learning (RL) for genome assembly. We expand upon the previous approach found in the literature to solve this problem by carefully exploring the learning aspects of the proposed intelligent agent, which uses the Q-learning algorithm. We improved the reward system and optimized the exploration of the state space based on pruning and in collaboration with evolutionary computing (>300% improvement). We tested the new approaches on 23 environments. Our results suggest the unsatisfactory performance of the approaches, both in terms of assembly quality and execution time, providing strong evidence for the poor scalability of the studied reinforcement learning approaches to the genome assembly problem. Finally, we discuss the existing proposal, complemented by attempts at improvement that also proved insufficient. In doing so, we contribute to the scientific community by offering a clear mapping of the limitations and challenges that should be taken into account in future attempts to apply reinforcement learning to genome assembly.

摘要

基因组组装仍然是一个未解决的问题,从头组装策略(即那些在没有参考序列的情况下运行的策略)在基因组学中是相关但计算复杂的任务。尽管从头组装程序此前已成功应用于基因组项目,但仍然没有“最佳组装程序”,组装程序的选择和设置仍依赖于生物信息学专家。因此,与其他计算复杂的问题一样,机器学习已成为开发准确、快速且自主的组装程序的一种替代(或补充)方法。强化学习已被证明在解决诸如游戏等无监督的复杂活动方面很有前景,迫切需要了解这种方法在解决诸如DNA片段组装问题等“现实生活”问题时的局限性。在本研究中,我们分析了通过强化学习(RL)将机器学习应用于基因组组装的边界。我们通过仔细探索所提出的使用Q学习算法的智能体的学习方面,扩展了文献中发现的解决此问题的先前方法。我们改进了奖励系统,并基于剪枝并与进化计算协作优化了状态空间的探索(提高了300%以上)。我们在23种环境下测试了新方法。我们的结果表明,这些方法在组装质量和执行时间方面的性能都不尽人意,为所研究的强化学习方法在基因组组装问题上的扩展性较差提供了有力证据。最后,我们讨论了现有方案,并辅以同样被证明不足的改进尝试。通过这样做,我们为科学界做出了贡献,明确指出了在未来尝试将强化学习应用于基因组组装时应考虑的局限性和挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33b7/12405310/581b8e96a4d2/fbinf-05-1633623-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验