• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过序列分割和MapReduce提高RNA二级结构预测的准确性和效率。

Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce.

作者信息

Zhang Boyu, Yehdego Daniel T, Johnson Kyle L, Leung Ming-Ying, Taufer Michela

出版信息

BMC Struct Biol. 2013;13 Suppl 1(Suppl 1):S3. doi: 10.1186/1472-6807-13-S1-S3. Epub 2013 Nov 8.

DOI:10.1186/1472-6807-13-S1-S3
PMID:24564983
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3952952/
Abstract

BACKGROUND

Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment.

RESULTS

On average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the Nodaviridae virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However, as the lengths of the RNA sequences increase, the fine-grained mapping can surpass the coarse-grained mapping in performance.

CONCLUSIONS

By using our MapReduce framework together with statistical analysis on the accuracy retention results, we observe how the inversion-based chunking methods can outperform predictions using the whole sequence. Our chunk-based approach also enables us to predict secondary structures for very long RNA sequences, which is not feasible with traditional methods alone.

摘要

背景

核糖核酸(RNA)分子在包括基因表达和调控在内的许多生物过程中发挥着重要作用。其二级结构对于RNA的功能至关重要,并且二级结构的预测受到广泛研究。我们之前的研究表明,将长序列切割成较短的片段,使用热力学方法独立预测片段的二级结构,并从预测的片段结构重建整个二级结构,比使用RNA序列整体预测二级结构能产生更高的准确性。片段化、预测和重建过程可以使用不同的方法和参数,其中一些方法产生的预测比其他方法更准确。在本文中,我们使用七个流行的二级结构预测程序,研究三种不同片段化方法的预测准确性和效率,这些程序应用于两个具有已知二级结构的RNA数据集,其中包括假结和非假结序列,以及一个以前未预测过结构的病毒基因组RNA家族。我们基于Hadoop的模块化MapReduce框架使我们能够在并行且强大的环境中研究该问题。

结果

平均而言,对于50个非假结序列,我们的片段化方法和七个预测程序的最大准确性保留值大于1,这意味着使用片段化预测的二级结构比使用整个序列预测的二级结构更类似于真实结构。对于23个假结序列,我们观察到类似的结果,但使用中心片段化方法的NUPACK程序除外。对来自诺达病毒科病毒家族的14个长RNA序列的性能分析概述了MapReduce框架中片段化和预测的粗粒度映射如何在短RNA序列中表现出更短的周转时间。然而,随着RNA序列长度的增加,细粒度映射在性能上可以超过粗粒度映射。

结论

通过将我们的MapReduce框架与准确性保留结果的统计分析相结合,我们观察到基于反转的片段化方法如何优于使用整个序列的预测。我们基于片段的方法还使我们能够预测非常长的RNA序列的二级结构,这仅用传统方法是不可行的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/e81ba47a0539/1472-6807-13-S1-S3-21.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/1ad74ec90fbd/1472-6807-13-S1-S3-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/da0a0a1b25a4/1472-6807-13-S1-S3-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/9a3c9c6cb122/1472-6807-13-S1-S3-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/10b6ff095a01/1472-6807-13-S1-S3-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/35787f3318ef/1472-6807-13-S1-S3-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/75f8a1ad0a8f/1472-6807-13-S1-S3-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/2484daad1a76/1472-6807-13-S1-S3-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/3bb0f77b4290/1472-6807-13-S1-S3-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/a87b1b76f5be/1472-6807-13-S1-S3-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/90439679d8dc/1472-6807-13-S1-S3-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/6beaac5ef7c6/1472-6807-13-S1-S3-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/6ef64f92dfe6/1472-6807-13-S1-S3-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/ef70b84df7f8/1472-6807-13-S1-S3-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/7ccfd4f2fbe3/1472-6807-13-S1-S3-14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/70dedbcc5d45/1472-6807-13-S1-S3-15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/7c43bfc65de7/1472-6807-13-S1-S3-16.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/b19dff82957a/1472-6807-13-S1-S3-17.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/fa898399d3bb/1472-6807-13-S1-S3-18.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/9010aeeea12d/1472-6807-13-S1-S3-19.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/46ac420a30ee/1472-6807-13-S1-S3-20.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/e81ba47a0539/1472-6807-13-S1-S3-21.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/1ad74ec90fbd/1472-6807-13-S1-S3-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/da0a0a1b25a4/1472-6807-13-S1-S3-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/9a3c9c6cb122/1472-6807-13-S1-S3-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/10b6ff095a01/1472-6807-13-S1-S3-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/35787f3318ef/1472-6807-13-S1-S3-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/75f8a1ad0a8f/1472-6807-13-S1-S3-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/2484daad1a76/1472-6807-13-S1-S3-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/3bb0f77b4290/1472-6807-13-S1-S3-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/a87b1b76f5be/1472-6807-13-S1-S3-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/90439679d8dc/1472-6807-13-S1-S3-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/6beaac5ef7c6/1472-6807-13-S1-S3-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/6ef64f92dfe6/1472-6807-13-S1-S3-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/ef70b84df7f8/1472-6807-13-S1-S3-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/7ccfd4f2fbe3/1472-6807-13-S1-S3-14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/70dedbcc5d45/1472-6807-13-S1-S3-15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/7c43bfc65de7/1472-6807-13-S1-S3-16.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/b19dff82957a/1472-6807-13-S1-S3-17.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/fa898399d3bb/1472-6807-13-S1-S3-18.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/9010aeeea12d/1472-6807-13-S1-S3-19.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/46ac420a30ee/1472-6807-13-S1-S3-20.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de6/3952952/e81ba47a0539/1472-6807-13-S1-S3-21.jpg

相似文献

1
Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce.通过序列分割和MapReduce提高RNA二级结构预测的准确性和效率。
BMC Struct Biol. 2013;13 Suppl 1(Suppl 1):S3. doi: 10.1186/1472-6807-13-S1-S3. Epub 2013 Nov 8.
2
Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.基于反转游程和MapReduce的长RNA序列二级结构预测
IEEE Int Symp Parallel Distrib Process Workshops Phd Forum. 2013 May;2013:520-529. doi: 10.1109/IPDPSW.2013.109.
3
Crumple: a method for complete enumeration of all possible pseudoknot-free RNA secondary structures.Crumple:一种用于完全枚举所有可能无伪结 RNA 二级结构的方法。
PLoS One. 2012;7(12):e52414. doi: 10.1371/journal.pone.0052414. Epub 2012 Dec 27.
4
TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences.TurboFold:用于多个 RNA 序列的二级结构的迭代概率估计。
BMC Bioinformatics. 2011 Apr 20;12:108. doi: 10.1186/1471-2105-12-108.
5
A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures.一种用于预测RNA假结二级结构的快速且稳健的迭代算法。
BMC Bioinformatics. 2014 May 18;15:147. doi: 10.1186/1471-2105-15-147.
6
A domain-based model for predicting large and complex pseudoknotted structures.基于结构域的方法预测大型复杂假结结构。
RNA Biol. 2012 Feb;9(2):200-11. doi: 10.4161/rna.18488. Epub 2012 Feb 1.
7
Novel and efficient RNA secondary structure prediction using hierarchical folding.使用分层折叠进行新型高效的RNA二级结构预测。
J Comput Biol. 2008 Mar;15(2):139-63. doi: 10.1089/cmb.2007.0198.
8
Improved free energy parameters for RNA pseudoknotted secondary structure prediction.改进的 RNA 假结二级结构预测的自由能参数。
RNA. 2010 Jan;16(1):26-42. doi: 10.1261/rna.1689910. Epub 2009 Nov 20.
9
SARNA-Predict: accuracy improvement of RNA secondary structure prediction using permutation-based simulated annealing.SARNA-Predict:基于排列模拟退火的 RNA 二级结构预测准确性改进。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Oct-Dec;7(4):727-40. doi: 10.1109/TCBB.2008.97.
10
Parsing nucleic acid pseudoknotted secondary structure: algorithm and applications.解析核酸假结二级结构:算法与应用
J Comput Biol. 2007 Jan-Feb;14(1):16-32. doi: 10.1089/cmb.2006.0108.

引用本文的文献

1
Accurate Classification of RNA Structures Using Topological Fingerprints.使用拓扑指纹对RNA结构进行准确分类
PLoS One. 2016 Oct 18;11(10):e0164726. doi: 10.1371/journal.pone.0164726. eCollection 2016.
2
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.MapReduce 编程框架在临床大数据分析中的应用:现状与未来趋势。
BioData Min. 2014 Oct 29;7:22. doi: 10.1186/1756-0381-7-22. eCollection 2014.

本文引用的文献

1
Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.基于反转游程和MapReduce的长RNA序列二级结构预测
IEEE Int Symp Parallel Distrib Process Workshops Phd Forum. 2013 May;2013:520-529. doi: 10.1109/IPDPSW.2013.109.
2
FX: an RNA-Seq analysis tool on the cloud.FX:一个云端的 RNA-Seq 分析工具。
Bioinformatics. 2012 Mar 1;28(5):721-3. doi: 10.1093/bioinformatics/bts023. Epub 2012 Jan 17.
3
IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming.
IPknot:使用整数规划快速准确地预测具有假结的 RNA 二级结构。
Bioinformatics. 2011 Jul 1;27(13):i85-93. doi: 10.1093/bioinformatics/btr215.
4
Cloud-scale RNA-sequencing differential expression analysis with Myrna.利用 Myrna 进行云规模 RNA-seq 差异表达分析。
Genome Biol. 2010;11(8):R83. doi: 10.1186/gb-2010-11-8-r83. Epub 2010 Aug 11.
5
A 3' terminal stem-loop structure in Nodamura virus RNA2 forms an essential cis-acting signal for RNA replication.在 Nodamura 病毒 RNA2 中,一个 3' 末端茎环结构形成了 RNA 复制必需的顺式作用信号。
Virus Res. 2010 Jun;150(1-2):12-21. doi: 10.1016/j.virusres.2010.02.006. Epub 2010 Feb 20.
6
RNAVLab: A virtual laboratory for studying RNA secondary structures based on grid computing technology.RNAV实验室:一个基于网格计算技术研究RNA二级结构的虚拟实验室。
Parallel Comput. 2008 Nov 1;34(11):661-680. doi: 10.1016/j.parco.2008.08.002.
7
PseudoBase++: an extension of PseudoBase for easy searching, formatting and visualization of pseudoknots.伪结数据库++:伪结数据库的扩展,便于搜索、格式化和可视化伪结。
Nucleic Acids Res. 2009 Jan;37(Database issue):D127-35. doi: 10.1093/nar/gkn806. Epub 2008 Nov 6.
8
UNAFold: software for nucleic acid folding and hybridization.UNAFold:用于核酸折叠和杂交的软件。
Methods Mol Biol. 2008;453:3-31. doi: 10.1007/978-1-60327-429-6_1.
9
Viral RNA pseudoknots: versatile motifs in gene expression and replication.病毒RNA假结:基因表达与复制中的多功能基序
Nat Rev Microbiol. 2007 Aug;5(8):598-610. doi: 10.1038/nrmicro1704.
10
AT excursion: a new approach to predict replication origins in viral genomes by locating AT-rich regions.AT偏移:一种通过定位富含AT区域来预测病毒基因组复制起点的新方法。
BMC Bioinformatics. 2007 May 21;8:163. doi: 10.1186/1471-2105-8-163.