Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, Padua, Italy.
Department of Agricultural Sciences, University of Udine, via Palladio 8, Udine, Italy.
Database (Oxford). 2018 Jan 1;2018:bay125. doi: 10.1093/database/bay125.
Despite a fast-growing number of available plant genomes, available computational resources are poorly integrated and provide only limited access to the underlying data. Most existing databases focus on DNA/RNA data or specific gene families, with less emphasis on protein structure, function and variability. In particular, despite the economic importance of many plant accessions, there are no straightforward ways to retrieve or visualize information on their differences. To fill this gap, we developed PhytoTypeDB (http://phytotypedb.bio.unipd.it/), a scalable database containing plant protein annotations and genetic variants from resequencing of different accessions. The database content is generated by an integrated pipeline, exploiting state-of-the-art methods for protein characterization requiring only the proteome reference sequence and variant calling files. Protein names for unknown proteins are inferred by homology for over 95% of the entries. Single-nucleotide variants are visualized along with protein annotation in a user-friendly web interface. The server offers an effective querying system, which allows to compare variability among different species and accessions, to generate custom data sets based on shared functional features or to perform sequence searches. A documented set of exposed RESTful endpoints make the data accessible programmatically by third-party clients.
尽管可用的植物基因组数量在快速增长,但可用的计算资源整合得很差,只能有限地访问基础数据。大多数现有数据库都专注于 DNA/RNA 数据或特定的基因家族,对蛋白质结构、功能和可变性的重视程度较低。特别是,尽管许多植物品种具有重要的经济意义,但目前还没有简单的方法来检索或可视化它们之间差异的信息。为了填补这一空白,我们开发了 PhytoTypeDB(http://phytotypedb.bio.unipd.it/),这是一个可扩展的数据库,包含了来自不同品种重测序的植物蛋白注释和遗传变异。数据库的内容是通过一个集成的管道生成的,该管道利用了蛋白质特征化的最先进方法,只需要蛋白质组参考序列和变异调用文件。对于 95%以上的条目,未知蛋白质的名称是通过同源性推断出来的。单核苷酸变异与蛋白质注释一起在用户友好的 Web 界面中可视化。该服务器提供了一个有效的查询系统,允许比较不同物种和品种之间的可变性,根据共享的功能特征生成自定义数据集,或执行序列搜索。一组记录在案的暴露的 RESTful 端点使第三方客户端能够以编程方式访问数据。