Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad255.
To capture structural homology in RNAs, alignment and folding (AF) of RNA homologs has been a fundamental framework around RNA science. Learning sufficient scoring parameters for simultaneous AF (SAF) is an undeveloped subject because evaluating them is computationally expensive.
We developed ConsTrain-a gradient-based machine learning method for rich SAF scoring. We also implemented ConsAlign-a SAF tool composed of ConsTrain's learned scoring parameters. To aim for better AF quality, ConsAlign employs (1) transfer learning from well-defined scoring models and (2) the ensemble model between the ConsTrain model and a well-established thermodynamic scoring model. Keeping comparable running time, ConsAlign demonstrated competitive AF prediction quality among current AF tools.
Our code and our data are freely available at https://github.com/heartsh/consalign and https://github.com/heartsh/consprob-trained.
为了捕捉 RNA 中的结构同源性,RNA 同源物的比对和折叠 (AF) 一直是 RNA 科学的基本框架。学习同时进行 AF (SAF) 的足够评分参数是一个尚未开发的课题,因为评估它们的计算成本很高。
我们开发了 ConsTrain-一种基于梯度的机器学习方法,用于丰富的 SAF 评分。我们还实现了 ConsAlign-SAF 工具,该工具由 ConsTrain 学习的评分参数组成。为了获得更好的 AF 质量,ConsAlign 采用了 (1) 来自定义良好的评分模型的迁移学习,以及 (2) ConsTrain 模型和成熟热力学评分模型之间的集成模型。在保持可比运行时间的情况下,ConsAlign 在当前的 AF 工具中表现出了有竞争力的 AF 预测质量。
我们的代码和数据可在 https://github.com/heartsh/consalign 和 https://github.com/heartsh/consprob-trained 上免费获得。