Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.
Key Laboratory of Protein and Peptide Pharmaceuticals and Laboratory of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.
Brief Bioinform. 2018 Jul 20;19(4):636-643. doi: 10.1093/bib/bbx005.
Small proteins is the general term for proteins with length shorter than 100 amino acids. Identification and functional studies of small proteins have advanced rapidly in recent years, and several studies have shown that small proteins play important roles in diverse functions including development, muscle contraction and DNA repair. Identification and characterization of previously unrecognized small proteins may contribute in important ways to cell biology and human health. Current databases are generally somewhat deficient in that they have either not collected small proteins systematically, or contain only predictions of small proteins in a limited number of tissues and species. Here, we present a specifically designed web-accessible database, small proteins database (SmProt, http://bioinfo.ibp.ac.cn/SmProt), which is a database documenting small proteins. The current release of SmProt incorporates 255 010 small proteins computationally or experimentally identified in 291 cell lines/tissues derived from eight popular species. The database provides a variety of data including basic information (sequence, location, gene name, organism, etc.) as well as specific information (experiment, function, disease type, etc.). To facilitate data extraction, SmProt supports multiple search options, including species, genome location, gene name and their aliases, cell lines/tissues, ORF type, gene type, PubMed ID and SmProt ID. SmProt also incorporates a service for the BLAST alignment search and provides a local UCSC Genome Browser. Additionally, SmProt defines a high-confidence set of small proteins and predicts the functions of the small proteins.
小蛋白是指长度短于 100 个氨基酸的蛋白质的统称。近年来,小蛋白的鉴定和功能研究进展迅速,有几项研究表明小蛋白在多种功能中发挥着重要作用,包括发育、肌肉收缩和 DNA 修复。鉴定和描述以前未被识别的小蛋白可能会在细胞生物学和人类健康方面做出重要贡献。目前的数据库通常存在一些不足之处,要么没有系统地收集小蛋白,要么只包含少数组织和物种中对小蛋白的预测。在这里,我们提出了一个专门设计的、可通过网络访问的数据库,即小蛋白数据库 (SmProt,http://bioinfo.ibp.ac.cn/SmProt),这是一个记录小蛋白的数据库。当前版本的 SmProt 整合了 291 种源自 8 种常见物种的细胞系/组织中计算或实验鉴定的 255010 种小蛋白。该数据库提供了多种数据,包括基本信息(序列、位置、基因名称、生物等)和特定信息(实验、功能、疾病类型等)。为了方便数据提取,SmProt 支持多种搜索选项,包括物种、基因组位置、基因名称及其别名、细胞系/组织、ORF 类型、基因类型、PubMed ID 和 SmProt ID。SmProt 还整合了 BLAST 对齐搜索服务,并提供了本地 UCSC 基因组浏览器。此外,SmProt 定义了一组高度可信的小蛋白,并预测了小蛋白的功能。