Papanicolaou Alexie, Gebauer-Jung Steffi, Blaxter Mark L, Owen McMillan W, Jiggins Chris D
Institute for Evolutionary Biology, University of Edinburgh, King's Buildings, EH9 3JT, UK.
Nucleic Acids Res. 2008 Jan;36(Database issue):D582-7. doi: 10.1093/nar/gkm853. Epub 2007 Oct 12.
With over 100 000 species and a large community of evolutionary biologists, population ecologists, pest biologists and genome researchers, the Lepidoptera are an important insect group. Genomic resources [expressed sequence tags (ESTs), genome sequence, genetic and physical maps, proteomic and microarray datasets] are growing, but there has up to now been no single access and analysis portal for this group. Here we present ButterflyBase (http://www.butterflybase.org), a unified resource for lepidopteran genomics. A total of 273 077 ESTs from more than 30 different species have been clustered to generate stable unigene sets, and robust protein translations derived from each unigene cluster. Clusters and their protein translations are annotated with BLAST-based similarity, gene ontology (GO), enzyme classification (EC) and Kyoto encyclopaedia of genes and genomes (KEGG) terms, and are also searchable using similarity tools such as BLAST and MS-BLAST. The database supports many needs of the lepidopteran research community, including molecular marker development, orthologue prediction for deep phylogenetics, and detection of rapidly evolving proteins likely involved in host-pathogen or other evolutionary processes. ButterflyBase is expanding to include additional genomic sequence, ecological and mapping data for key species.
鳞翅目昆虫种类超过10万种,拥有众多进化生物学家、种群生态学家、害虫生物学家和基因组研究人员,是一个重要的昆虫类群。基因组资源(表达序列标签(EST)、基因组序列、遗传图谱和物理图谱、蛋白质组学和微阵列数据集)不断增加,但目前还没有针对该类群的单一访问和分析门户。在此,我们展示了ButterflyBase(http://www.butterflybase.org),这是一个鳞翅目昆虫基因组学的统一资源库。来自30多个不同物种的总共273077个EST已被聚类,以生成稳定的单基因集,并从每个单基因簇中获得可靠的蛋白质翻译。这些簇及其蛋白质翻译通过基于BLAST的相似性、基因本体(GO)、酶分类(EC)和京都基因与基因组百科全书(KEGG)术语进行注释,也可使用BLAST和MS-BLAST等相似性工具进行搜索。该数据库满足了鳞翅目研究群体的许多需求,包括分子标记开发、深度系统发育的直系同源物预测,以及检测可能参与宿主-病原体或其他进化过程的快速进化蛋白质。ButterflyBase正在不断扩展,以纳入关键物种的更多基因组序列、生态和图谱数据。