The Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA, Australia.
Adv Exp Med Biol. 2021;1346:67-89. doi: 10.1007/978-3-030-80352-0_4.
In eukaryotic organisms, subcellular protein location is critical in defining protein function and understanding sub-functionalization of gene families. Some proteins have defined locations, whereas others have low specificity targeting and complex accumulation patterns. There is no single approach that can be considered entirely adequate for defining the in vivo location of all proteins. By combining evidence from different approaches, the strengths and weaknesses of different technologies can be estimated, and a location consensus can be built. The Subcellular Location of Proteins in Arabidopsis database ( http://suba.live/ ) combines experimental data sets that have been reported in the literature and is analyzing these data to provide useful tools for biologists to interpret their own data. Foremost among these tools is a consensus classifier (SUBAcon) that computes a proposed location for all proteins based on balancing the experimental evidence and predictions. Further tools analyze sets of proteins to define the abundance of cellular structures. Extending these types of resources to plant crop species has been complex due to polyploidy, gene family expansion and contraction, and the movement of pathways and processes within cells across the plant kingdom. The Crop Proteins of Annotated Location database ( http://crop-pal.org/ ) has developed a range of subcellular location resources including a species-specific voting consensus for 12 plant crop species that offers collated evidence and filters for current crop proteomes akin to SUBA. Comprehensive cross-species comparison of these data shows that the sub-cellular proteomes (subcellulomes) depend only to some degree on phylogenetic relationship and are more conserved in major biosynthesis than in metabolic pathways. Together SUBA and cropPAL created reference subcellulomes for plants as well as species-specific subcellulomes for cross-species data mining. These data collections are increasingly used by the research community to provide a subcellular protein location layer, inform models of compartmented cell function and protein-protein interaction network, guide future molecular crop breeding strategies, or simply answer a specific question-where is my protein of interest inside the cell?
在真核生物中,亚细胞蛋白质定位对于确定蛋白质功能和理解基因家族的亚功能化至关重要。有些蛋白质具有明确的定位,而有些蛋白质则具有低特异性靶向和复杂的积累模式。没有一种单一的方法可以被认为完全适用于定义所有蛋白质的体内位置。通过结合来自不同方法的证据,可以评估不同技术的优缺点,并构建位置共识。拟南芥蛋白质亚细胞定位数据库(http://suba.live/)结合了文献中报道的实验数据集,并对这些数据进行分析,为生物学家提供有用的工具来解释他们自己的数据。其中最重要的工具是共识分类器(SUBAcon),它根据平衡实验证据和预测来计算所有蛋白质的建议位置。其他工具进一步分析蛋白质组以定义细胞结构的丰度。将这些类型的资源扩展到植物作物物种由于多倍体、基因家族的扩张和收缩以及途径和过程在植物王国中的细胞内迁移而变得复杂。已注释位置的作物蛋白质数据库(http://crop-pal.org/)开发了一系列亚细胞定位资源,包括 12 种植物作物的物种特异性投票共识,该共识提供了汇集的证据和类似于 SUBA 的当前作物蛋白质组的筛选器。对这些数据的全面跨物种比较表明,亚细胞蛋白质组(亚细胞蛋白质组)仅在一定程度上取决于系统发育关系,并且在主要生物合成中比在代谢途径中更保守。SUBA 和 cropPAL 一起为植物创建了参考亚细胞蛋白质组,以及跨物种数据挖掘的物种特异性亚细胞蛋白质组。这些数据集越来越多地被研究界用于提供亚细胞蛋白质位置层,为分隔细胞功能的模型和蛋白质-蛋白质相互作用网络提供信息,指导未来的分子作物育种策略,或者简单地回答一个具体问题——我感兴趣的蛋白质在细胞内的什么位置?