• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于比较RNA基因预测的二核苷酸控制空模型

Dinucleotide controlled null models for comparative RNA gene prediction.

作者信息

Gesell Tanja, Washietl Stefan

机构信息

Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria.

出版信息

BMC Bioinformatics. 2008 May 27;9:248. doi: 10.1186/1471-2105-9-248.

DOI:10.1186/1471-2105-9-248
PMID:18505553
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2453142/
Abstract

BACKGROUND

Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available.

RESULTS

We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content.

CONCLUSION

SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered.

AVAILABILITY

SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.

摘要

背景

RNA结构的比较预测可用于在基因组筛选中识别功能性非编码RNA。Babak等人[《BMC生物信息学》。8:33]最近表明,RNA基因预测程序可能会受到基因组二核苷酸含量的影响,特别是那些使用包括堆积能量的热力学折叠模型的程序。因此,需要采用保留二核苷酸的控制策略来评估此类预测的重要性。虽然多年来已经有针对单序列的随机化算法,但对于多序列比对来说,这个问题仍然具有挑战性,目前还没有可用的算法。

结果

我们提出了一个名为SISSIz的程序,它可以模拟具有给定平均二核苷酸含量的多序列比对。满足准确零模型的其他要求后,随机化的比对平均具有相同的序列多样性,并保留局部保守性和空位模式。我们使用了一种系统发育替代模型,该模型包括重叠依赖性和位点特异性速率。通过快速启发式算法和基于距离的方法,在此模型下估计一棵树,并用它来指导模拟。新算法在脊椎动物基因组比对上进行了测试,并研究了其对RNA结构预测的影响。此外,我们直接将新的零模型与RNAalifold一致性折叠算法相结合,得到了一种基于热力学结构的RNA基因发现程序的新变体,该程序不受二核苷酸含量的影响。

结论

SISSIz实现了一种有效的算法,用于随机化保留二核苷酸含量的多序列比对。它可用于更准确地估计现有程序的假阳性率,为基于机器学习的程序训练生成阴性对照,或作为独立的RNA基因发现程序。也可以考虑在比较基因组学中其他需要随机化多序列比对的应用。

可用性

SISSIz以开源C代码形式提供,可以为每个主要平台进行编译,并可在此处下载:http://sourceforge.net/projects/sissiz 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/1cde711ff36f/1471-2105-9-248-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/6a3ffc78f86a/1471-2105-9-248-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/7533e8f19702/1471-2105-9-248-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/847968cecc2d/1471-2105-9-248-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/992156c9ae4d/1471-2105-9-248-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/497a6631e696/1471-2105-9-248-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/cbc6b09ed0fc/1471-2105-9-248-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/5072598a37b5/1471-2105-9-248-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/8ae3a3aa8c0b/1471-2105-9-248-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/1cde711ff36f/1471-2105-9-248-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/6a3ffc78f86a/1471-2105-9-248-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/7533e8f19702/1471-2105-9-248-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/847968cecc2d/1471-2105-9-248-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/992156c9ae4d/1471-2105-9-248-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/497a6631e696/1471-2105-9-248-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/cbc6b09ed0fc/1471-2105-9-248-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/5072598a37b5/1471-2105-9-248-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/8ae3a3aa8c0b/1471-2105-9-248-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afd3/2453142/1cde711ff36f/1471-2105-9-248-9.jpg

相似文献

1
Dinucleotide controlled null models for comparative RNA gene prediction.用于比较RNA基因预测的二核苷酸控制空模型
BMC Bioinformatics. 2008 May 27;9:248. doi: 10.1186/1471-2105-9-248.
2
Specific alignment of structured RNA: stochastic grammars and sequence annealing.结构化RNA的特定比对:随机语法与序列退火
Bioinformatics. 2008 Dec 1;24(23):2677-83. doi: 10.1093/bioinformatics/btn495. Epub 2008 Sep 16.
3
Multiperm: shuffling multiple sequence alignments while approximately preserving dinucleotide frequencies.多重排列:在大致保留二核苷酸频率的同时对多个序列比对进行重排。
Bioinformatics. 2009 Mar 1;25(5):668-9. doi: 10.1093/bioinformatics/btp006. Epub 2009 Jan 9.
4
Energy-based RNA consensus secondary structure prediction in multiple sequence alignments.基于能量的多序列比对中RNA共有二级结构预测
Methods Mol Biol. 2014;1097:125-41. doi: 10.1007/978-1-62703-709-9_7.
5
Considerations in the identification of functional RNA structural elements in genomic alignments.基因组比对中功能性RNA结构元件识别的考量因素。
BMC Bioinformatics. 2007 Jan 30;8:33. doi: 10.1186/1471-2105-8-33.
6
CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score.CentroidAlign:通过最大化预期对和分数实现结构化 RNA 的快速准确比对。
Bioinformatics. 2009 Dec 15;25(24):3236-43. doi: 10.1093/bioinformatics/btp580. Epub 2009 Oct 6.
7
RNAz 2.0: improved noncoding RNA detection.RNAz 2.0:改进的非编码RNA检测
Pac Symp Biocomput. 2010:69-79.
8
A local multiple alignment method for detection of non-coding RNA sequences.一种用于检测非编码RNA序列的局部多重比对方法。
Bioinformatics. 2009 Jun 15;25(12):1498-505. doi: 10.1093/bioinformatics/btp261. Epub 2009 Apr 17.
9
Pair hidden Markov models on tree structures.树结构上的成对隐马尔可夫模型。
Bioinformatics. 2003;19 Suppl 1:i232-40. doi: 10.1093/bioinformatics/btg1032.
10
Chain-RNA: a comparative ncRNA search tool based on the two-dimensional chain algorithm.链 RNA:一种基于二维链算法的比较 ncRNA 搜索工具。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):274-85. doi: 10.1109/TCBB.2012.137.

引用本文的文献

1
ECSFinder: optimized prediction of evolutionarily conserved RNA secondary structures from genome sequences.ECSFinder:从基因组序列中对进化保守RNA二级结构进行优化预测。
Nucleic Acids Res. 2025 Aug 11;53(15). doi: 10.1093/nar/gkaf780.
2
Computational discovery of conserved RNA structures and functional characterization of a structured lncRNA in .计算发现保守RNA结构及对一种结构型长链非编码RNA的功能表征 于……中 (注:原句结尾不完整)
Noncoding RNA Res. 2025 May 20;14:51-64. doi: 10.1016/j.ncrna.2025.05.010. eCollection 2025 Oct.
3
Comparative RNA Genomics.

本文引用的文献

1
Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum.恶性疟原虫新型结构化RNA的全基因组发现与验证
Genome Res. 2008 Feb;18(2):281-92. doi: 10.1101/gr.6836108. Epub 2007 Dec 20.
2
The UCSC Genome Browser Database: 2008 update.加州大学圣克鲁兹分校基因组浏览器数据库:2008年更新版。
Nucleic Acids Res. 2008 Jan;36(Database issue):D773-9. doi: 10.1093/nar/gkm966. Epub 2007 Dec 17.
3
Identification of novel Drosophila melanogaster microRNAs.新型黑腹果蝇微小RNA的鉴定
比较 RNA 基因组学。
Methods Mol Biol. 2024;2802:347-393. doi: 10.1007/978-1-0716-3838-5_12.
4
Discovery of a non-canonical GRHL1 binding site using deep convolutional and recurrent neural networks.利用深度卷积和循环神经网络发现非规范的 GRHL1 结合位点。
BMC Genomics. 2023 Dec 4;24(1):736. doi: 10.1186/s12864-023-09830-3.
5
Identification of over ten thousand candidate structured RNAs in viruses and phages.在病毒和噬菌体中鉴定出一万多种候选结构化RNA。
Comput Struct Biotechnol J. 2023 Nov 7;21:5630-5639. doi: 10.1016/j.csbj.2023.11.010. eCollection 2023.
6
Tailored machine learning models for functional RNA detection in genome-wide screens.用于全基因组筛选中功能性RNA检测的定制机器学习模型。
NAR Genom Bioinform. 2023 Aug 21;5(3):lqad072. doi: 10.1093/nargab/lqad072. eCollection 2023 Sep.
7
Evolutionary Conservation of RNA Secondary Structure.RNA二级结构的进化保守性
Methods Mol Biol. 2023;2586:121-146. doi: 10.1007/978-1-0716-2768-6_8.
8
ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes.ScanFold 2.0:一种快速鉴定基因组和转录组中潜在结构 RNA 靶标的方法。
PeerJ. 2022 Nov 8;10:e14361. doi: 10.7717/peerj.14361. eCollection 2022.
9
Comparative genomics identifies thousands of candidate structured RNAs in human microbiomes.比较基因组学在人类微生物组中鉴定出数千种候选结构 RNA。
Genome Biol. 2021 Apr 12;22(1):100. doi: 10.1186/s13059-021-02319-w.
10
The impact of different negative training data on regulatory sequence predictions.不同负向训练数据对调控序列预测的影响。
PLoS One. 2020 Dec 1;15(12):e0237412. doi: 10.1371/journal.pone.0237412. eCollection 2020.
PLoS One. 2007 Nov 28;2(11):e1265. doi: 10.1371/journal.pone.0001265.
4
Computational RNomics of drosophilids.果蝇的计算核糖核酸组学
BMC Genomics. 2007 Nov 8;8:406. doi: 10.1186/1471-2164-8-406.
5
Prediction of structural noncoding RNAs with RNAz.使用RNAz预测结构非编码RNA
Methods Mol Biol. 2007;395:503-26. doi: 10.1007/978-1-59745-514-5_32.
6
Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics.通过比较基因组学鉴定豆科植物共生菌苜蓿中华根瘤菌中差异表达的小非编码RNA
Mol Microbiol. 2007 Dec;66(5):1080-91. doi: 10.1111/j.1365-2958.2007.05978.x. Epub 2007 Oct 25.
7
Calculation of folding energies of single-stranded nucleic acid sequences: conceptual issues.单链核酸序列折叠能的计算:概念问题
J Theor Biol. 2007 Oct 21;248(4):745-53. doi: 10.1016/j.jtbi.2007.07.008. Epub 2007 Jul 18.
8
Use of tiling array data and RNA secondary structure predictions to identify noncoding RNA genes.利用平铺阵列数据和RNA二级结构预测来识别非编码RNA基因。
BMC Genomics. 2007 Jul 23;8:244. doi: 10.1186/1471-2164-8-244.
9
Structured RNAs in the ENCODE selected regions of the human genome.人类基因组ENCODE选定区域中的结构化RNA
Genome Res. 2007 Jun;17(6):852-64. doi: 10.1101/gr.5650707.
10
Annotating noncoding RNA genes.注释非编码RNA基因。
Annu Rev Genomics Hum Genet. 2007;8:279-98. doi: 10.1146/annurev.genom.8.080706.092419.