Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA.
Nucleic Acids Res. 2021 Jan 8;49(D1):D298-D308. doi: 10.1093/nar/gkaa931.
We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
我们介绍了 DescribePROT,这是一个预测蛋白质结构和功能的氨基酸水平描述符的数据库。DescribePROT 提供了一个全面的集合,其中包含使用 10 种流行且准确的算法针对 83 个完整蛋白质组预测的 13 种互补描述符,这些蛋白质组涵盖了关键的模式生物。当前版本包括 78 亿个接近 6 亿个氨基酸的预测,涉及 140 万个蛋白质。这些描述符包括序列保守性、位置特异性评分矩阵、二级结构、溶剂可及性、固有无序性、无序连接子、信号肽、MoRFs 以及与蛋白质、DNA 和 RNA 的相互作用。用户可以通过氨基酸序列和 UniProt 访问号和条目名称搜索 DescribePROT。预先计算的结果可即时提供。预测结果可通过交互式图形界面访问,该界面允许同时分析多个描述符,并且可以在蛋白质、蛋白质组和整个数据库规模上以结构化格式下载。DescribePROT 中包含的假定注释可用于广泛的研究,包括:蛋白质功能的研究、专注于治疗和疾病的应用项目,以及开发其他蛋白质序列描述符的预测器。未来的版本将扩展 DescribePROT 的覆盖范围。DescribePROT 可在 http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/ 访问。