Institute of Human Genetics, Polish Academy of Sciences, Strzeszyńska 32, 60-479, Poznań, Poland.
Department of Human Molecular Genetics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznańskiego 6, 61-614, Poznań, Poland.
BMC Bioinformatics. 2021 Oct 16;22(1):504. doi: 10.1186/s12859-021-04426-8.
The functions of RNA molecules are mainly determined by their secondary structures. These functions can also be predicted using bioinformatic tools that enable the alignment of multiple RNAs to determine functional domains and/or classify RNA molecules into RNA families. However, the existing multiple RNA alignment tools, which use structural information, are slow in aligning long molecules and/or a large number of molecules. Therefore, a more rapid tool for multiple RNA alignment may improve the classification of known RNAs and help to reveal the functions of newly discovered RNAs.
Here, we introduce an extremely fast Python-based tool called RNAlign2D. It converts RNA sequences to pseudo-amino acid sequences, which incorporate structural information, and uses a customizable scoring matrix to align these RNA molecules via the multiple protein sequence alignment tool MUSCLE.
RNAlign2D produces accurate RNA alignments in a very short time. The pseudo-amino acid substitution matrix approach utilized in RNAlign2D is applicable for virtually all protein aligners.
RNA 分子的功能主要由其二级结构决定。这些功能也可以通过生物信息学工具进行预测,这些工具可以对齐多个 RNA 以确定功能域和/或将 RNA 分子分类为 RNA 家族。然而,现有的使用结构信息的多个 RNA 对齐工具在对齐长分子和/或大量分子时速度较慢。因此,一种更快速的多 RNA 对齐工具可能会提高已知 RNA 的分类,并有助于揭示新发现的 RNA 的功能。
在这里,我们介绍了一种基于 Python 的极其快速的工具,称为 RNAlign2D。它将 RNA 序列转换为伪氨基酸序列,其中包含结构信息,并使用可定制的评分矩阵通过多序列比对工具 MUSCLE 对齐这些 RNA 分子。
RNAlign2D 可以在很短的时间内产生准确的 RNA 对齐。RNAlign2D 中使用的伪氨基酸替换矩阵方法适用于几乎所有的蛋白质比对器。