Cao Ruifang, Shi Yan, Chen Shuangguan, Ma Yimin, Chen Jiajun, Yang Juan, Chen Geng, Shi Tieliu
The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China.
The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
Nucleic Acids Res. 2017 Jan 4;45(D1):D827-D832. doi: 10.1093/nar/gkw1096. Epub 2016 Nov 29.
Millions of human single nucleotide polymorphisms (SNPs) or mutations have been identified so far, and these variants could be strongly correlated with phenotypic variations of traits/diseases. Among these variants, non-synonymous ones can result in amino-acid changes that are called single amino-acid polymorphisms (SAPs). Although some studies have tried to investigate the SAPs, only a small fraction of SAPs have been identified due to inadequately inferred protein variation database and the low coverage of mass spectrometry (MS) experiments. Here, we present the dbSAP database for conveniently accessing the comprehensive information and relationships of spectra, peptides and proteins of SAPs, as well as related genes, pathways, diseases and drug targets. In order to fully explore human SAPs, we built a customized protein database that contained comprehensive variant proteins by integrating and annotating the human SNPs and mutations from eight distinct databases (UniProt, Protein Mutation Database, HPMD, MSIPI, MS-CanProVar, dbSNP, Ensembl and COSMIC). After a series of quality controls, a total of 16 854 SAP peptides involving in 439 537 spectra were identified with large scale MS datasets from various human tissues and cell lines. dbSAP is freely available at http://www.megabionet.org/dbSAP/index.html.
到目前为止,已经鉴定出数百万种人类单核苷酸多态性(SNP)或突变,这些变异可能与性状/疾病的表型变异密切相关。在这些变异中,非同义变异可导致氨基酸变化,即单氨基酸多态性(SAP)。尽管一些研究试图对SAP进行研究,但由于蛋白质变异数据库推断不足以及质谱(MS)实验的覆盖率较低,仅鉴定出一小部分SAP。在此,我们展示了dbSAP数据库,以便于获取有关SAP的光谱、肽段和蛋白质以及相关基因、通路、疾病和药物靶点的全面信息及它们之间的关系。为了全面探索人类SAP,我们构建了一个定制的蛋白质数据库,该数据库通过整合和注释来自八个不同数据库(UniProt、蛋白质突变数据库、HPMD、MSIPI、MS-CanProVar、dbSNP、Ensembl和COSMIC)的人类SNP和突变,包含了全面的变异蛋白质。经过一系列质量控制后,利用来自各种人类组织和细胞系的大规模MS数据集,共鉴定出涉及439537个光谱的16854个SAP肽段。dbSAP可在http://www.megabionet.org/dbSAP/index.html免费获取。