Heazlewood Joshua L, Tonti-Filippini Julian, Verboom Robert E, Millar A Harvey
Australian Research Council Centre of Excellence in Plant Energy Biology, University of Western Australia, Crawley.
Plant Physiol. 2005 Oct;139(2):598-609. doi: 10.1104/pp.105.065532.
Substantial experimental datasets defining the subcellular location of Arabidopsis (Arabidopsis thaliana) proteins have been reported in the literature in the form of organelle proteomes built from mass spectrometry data (approximately 2,500 proteins). Subcellular location for specific proteins has also been published based on imaging of chimeric fluorescent fusion proteins in intact cells (approximately 900 proteins). Further, the more diverse history of biochemical determination of subcellular location is stored in the entries of the Swiss-Prot database for the products of many Arabidopsis genes (approximately 1,800 proteins). Combined with the range of bioinformatic targeting prediction tools and comparative genomic analysis, these experimental datasets provide a powerful basis for defining the final location of proteins within the wide variety of subcellular structures present inside Arabidopsis cells. We have analyzed these published experimental and prediction data to answer a range of substantial questions facing researchers about the veracity of these approaches to determining protein location and their interrelatedness. We have merged these data to form the subcellular location database for Arabidopsis proteins (SUBA), providing an integrated understanding of protein location, encompassing the plastid, mitochondrion, peroxisome, nucleus, plasma membrane, endoplasmic reticulum, vacuole, Golgi, cytoskeleton structures, and cytosol (www.suba.bcs.uwa.edu.au). This includes data on more than 4,400 nonredundant Arabidopsis protein sequences. We also provide researchers with an online resource that may be used to query protein sets or protein families and determine whether predicted or experimental location data exist; to analyze the nature of contamination between published proteome sets; and/or for building theoretical subcellular proteomes in Arabidopsis using the latest experimental data.
文献中已报道了大量确定拟南芥(Arabidopsis thaliana)蛋白质亚细胞定位的实验数据集,这些数据集以基于质谱数据构建的细胞器蛋白质组(约2500种蛋白质)的形式呈现。特定蛋白质的亚细胞定位也已基于完整细胞中嵌合荧光融合蛋白的成像结果发表(约900种蛋白质)。此外,许多拟南芥基因产物的亚细胞定位生化测定的更多样化历史记录存储在瑞士蛋白质数据库(Swiss-Prot)条目中(约1800种蛋白质)。结合一系列生物信息学靶向预测工具和比较基因组分析,这些实验数据集为确定拟南芥细胞内多种亚细胞结构中蛋白质的最终定位提供了有力依据。我们分析了这些已发表的实验和预测数据,以回答研究人员在确定蛋白质定位的这些方法的准确性及其相互关系方面面临的一系列重要问题。我们合并了这些数据,形成了拟南芥蛋白质亚细胞定位数据库(SUBA),提供了对蛋白质定位的综合理解,涵盖质体、线粒体、过氧化物酶体、细胞核、质膜、内质网、液泡、高尔基体、细胞骨架结构和细胞质(www.suba.bcs.uwa.edu.au)。这包括有关4400多个非冗余拟南芥蛋白质序列的数据。我们还为研究人员提供了一个在线资源,可用于查询蛋白质集或蛋白质家族,并确定是否存在预测或实验定位数据;分析已发表蛋白质组集之间的污染性质;和/或使用最新实验数据构建拟南芥的理论亚细胞蛋白质组。