School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W84-9. doi: 10.1093/nar/gkq320. Epub 2010 May 5.
Derivation of biological meaning from large sets of proteins or genes is a frequent task in genomic and proteomic studies. Such sets often arise from experimental methods including large-scale gene expression experiments and mass spectrometry (MS) proteomics. Large sets of genes or proteins are also the outcome of computational methods such as BLAST search and homology-based classifications. We have developed the PANDORA web server, which functions as a platform for the advanced biological analysis of sets of genes, proteins, or proteolytic peptides. First, the input set is mapped to a set of corresponding proteins. Then, an analysis of the protein set produces a graph-based hierarchy which highlights intrinsic relations amongst biological subsets, in light of their different annotations from multiple annotation resources. PANDORA integrates a large collection of annotation sources (GO, UniProt Keywords, InterPro, Enzyme, SCOP, CATH, Gene-3D, NCBI taxonomy and more) that comprise approximately 200,000 different annotation terms associated with approximately 3.2 million sequences from UniProtKB. Statistical enrichment based on a binomial approximation of the hypergeometric distribution and corrected for multiple hypothesis tests is calculated using several background sets, including major gene-expression DNA-chip platforms. Users can also visualize either standard or user-defined binary and quantitative properties alongside the proteins. PANDORA 4.2 is available at http://www.pandora.cs.huji.ac.il.
从大量蛋白质或基因中推导出生物学意义是基因组学和蛋白质组学研究中的常见任务。这些集合通常来自于实验方法,包括大规模基因表达实验和质谱(MS)蛋白质组学。大型基因或蛋白质集合也是 BLAST 搜索和基于同源性分类等计算方法的结果。我们开发了 PANDORA 网络服务器,它是用于对基因、蛋白质或蛋白水解肽集合进行高级生物学分析的平台。首先,将输入集合映射到一组相应的蛋白质。然后,对蛋白质集合的分析会生成基于图的层次结构,根据来自多个注释资源的不同注释,突出生物学子集之间的内在关系。PANDORA 集成了大量注释源(GO、UniProt Keywords、InterPro、Enzyme、SCOP、CATH、Gene-3D、NCBI 分类法等),其中包含大约 200,000 个不同的注释术语,这些术语与来自 UniProtKB 的大约 320 万个序列相关联。使用几种背景集(包括主要基因表达 DNA 芯片平台),基于超几何分布的二项式逼近并针对多重假设检验进行校正,计算基于泊松分布的统计富集。用户还可以与蛋白质一起可视化标准或用户定义的二进制和定量属性。PANDORA 4.2 可在 http://www.pandora.cs.huji.ac.il 获得。