Nawrocki Eric P, Burge Sarah W, Bateman Alex, Daub Jennifer, Eberhardt Ruth Y, Eddy Sean R, Floden Evan W, Gardner Paul P, Jones Thomas A, Tate John, Finn Robert D
HHMI Janelia Farm Research Campus, Ashburn, VA, USA.
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
Nucleic Acids Res. 2015 Jan;43(Database issue):D130-7. doi: 10.1093/nar/gku1063. Epub 2014 Nov 11.
The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families.
Rfam数据库(可在http://rfam.xfam.org获取)是一个非编码RNA家族的集合,由人工整理的序列比对、共有二级结构以及从相应的维基百科、分类学和本体资源收集的注释所代表。在本文中,我们详细介绍了Rfam 12.0版本中数据和网站的更新与改进。我们描述了搜索管道升级为使用Infernal 1.1,并通过与先前版本比较展示了其改进的同源性检测能力。新管道让用户更易于应用于自己的数据集,并且我们展示了它在注释各种大小的基因组和宏基因组数据集中RNA的能力。Rfam已扩展到包括260个新家族,其中包括经过充分研究的大亚基核糖体RNA家族,并且首次纳入了家族内基于短序列和结构的RNA基序的信息。