School of Sciences and Technology, University of Camerino, Via Madonna delle Carceri 7, 62032, Camerino, MC, Italy.
BMC Bioinformatics. 2023 Jun 15;23(Suppl 6):575. doi: 10.1186/s12859-023-05362-5.
BACKGROUND: The ability to compare RNA secondary structures is important in understanding their biological function and for grouping similar organisms into families by looking at evolutionarily conserved sequences such as 16S rRNA. Most comparison methods and benchmarks in the literature focus on pseudoknot-free structures due to the difficulty of mapping pseudoknots in classical tree representations. Some approaches exist that permit to cluster pseudoknotted RNAs but there is not a general framework for evaluating their performance. RESULTS: We introduce an evaluation framework based on a similarity/dissimilarity measure obtained by a comparison method and agglomerative clustering. Their combination automatically partition a set of molecules into groups. To illustrate the framework we define and make available a benchmark of pseudoknotted (16S and 23S) and pseudoknot-free (5S) rRNA secondary structures belonging to Archaea, Bacteria and Eukaryota. We also consider five different comparison methods from the literature that are able to manage pseudoknots. For each method we clusterize the molecules in the benchmark to obtain the taxa at the rank phylum according to the European Nucleotide Archive curated taxonomy. We compute appropriate metrics for each method and we compare their suitability to reconstruct the taxa.
背景:比较 RNA 二级结构的能力对于理解其生物学功能以及通过观察 16S rRNA 等进化保守序列将相似的生物体分组到家族中非常重要。由于在经典树表示中映射假结的困难,文献中的大多数比较方法和基准都集中在无假结结构上。存在一些允许聚类假结 RNA 的方法,但没有用于评估其性能的通用框架。
结果:我们引入了一个基于通过比较方法和凝聚聚类获得的相似性/相异性度量的评估框架。它们的组合自动将一组分子划分为组。为了说明该框架,我们定义并提供了一个属于古菌、细菌和真核生物的假结(16S 和 23S)和无假结(5S)rRNA 二级结构的基准。我们还考虑了文献中五种不同的能够处理假结的比较方法。对于每种方法,我们将基准中的分子聚类以根据欧洲核苷酸档案库(curated taxonomy) 获得门等级的分类单元。我们为每种方法计算了适当的指标,并比较了它们重建分类单元的适用性。
BMC Bioinformatics. 2023-6-15
Nucleic Acids Res. 2005-3-3
BMC Bioinformatics. 2014-5-18
Bioinformatics. 2016-1-15
Int J Mol Sci. 2025-6-15
J Phys Chem B. 2021-2-4
Front Genet. 2020-10-26
Bioinformatics. 2020-6-1
Nucleic Acids Res. 2020-1-8
Nucleic Acids Res. 2020-1-8
BMC Bioinformatics. 2019-4-18
Nucleic Acids Res. 2018-7-2
Bioinformatics. 2018-4-15