Division of Biological Chemistry and Drug Discovery, College of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK.
BMC Genomics. 2011 Jan 27;12:74. doi: 10.1186/1471-2164-12-74.
Although primarily known as the site of ribosome subunit production, the nucleolus is involved in numerous and diverse cellular processes. Recent large-scale proteomics projects have identified thousands of human proteins that associate with the nucleolus. However, in most cases, we know neither the fraction of each protein pool that is nucleolus-associated nor whether their association is permanent or conditional.
To describe the dynamic localisation of proteins in the nucleolus, we investigated the extent of nucleolar association of proteins by first collating an extensively curated literature-derived dataset. This dataset then served to train a probabilistic predictor which integrates gene and protein characteristics. Unlike most previous experimental and computational studies of the nucleolar proteome that produce large static lists of nucleolar proteins regardless of their extent of nucleolar association, our predictor models the fluidity of the nucleolus by considering different classes of nucleolar-associated proteins. The new method predicts all human proteins as either nucleolar-enriched, nucleolar-nucleoplasmic, nucleolar-cytoplasmic or non-nucleolar. Leave-one-out cross validation tests reveal sensitivity values for these four classes ranging from 0.72 to 0.90 and positive predictive values ranging from 0.63 to 0.94. The overall accuracy of the classifier was measured to be 0.85 on an independent literature-based test set and 0.74 using a large independent quantitative proteomics dataset. While the three nucleolar-association groups display vastly different Gene Ontology biological process signatures and evolutionary characteristics, they collectively represent the most well characterised nucleolar functions.
Our proteome-wide classification of nucleolar association provides a novel representation of the dynamic content of the nucleolus. This model of nucleolar localisation thus increases the coverage while providing accurate and specific annotations of the nucleolar proteome. It will be instrumental in better understanding the central role of the nucleolus in the cell and its interaction with other subcellular compartments.
尽管核仁主要被认为是核糖体亚基产生的场所,但它参与了许多不同的细胞过程。最近的大规模蛋白质组学项目已经鉴定出数千种与核仁相关的人类蛋白质。然而,在大多数情况下,我们既不知道每个蛋白质池中与核仁相关的部分,也不知道它们的关联是永久性的还是有条件的。
为了描述蛋白质在核仁中的动态定位,我们首先整理了一个广泛编辑的文献衍生数据集,以研究蛋白质与核仁的关联程度。该数据集随后被用于训练一个概率预测器,该预测器整合了基因和蛋白质特征。与大多数以前的核仁蛋白质组学的实验和计算研究不同,这些研究无论其与核仁的关联程度如何,都会产生大量静态的核仁蛋白质列表,我们的预测器通过考虑不同类别的核仁相关蛋白质来模拟核仁的流动性。新方法将所有人类蛋白质预测为核仁丰富、核仁核质、核仁细胞质或非核仁。留一法交叉验证测试显示,这四个类别的敏感性值范围为 0.72 到 0.90,阳性预测值范围为 0.63 到 0.94。在一个独立的基于文献的测试集上,分类器的整体准确性测量值为 0.85,在一个大型独立的定量蛋白质组学数据集上为 0.74。虽然这三个核仁关联组显示出截然不同的基因本体论生物过程特征和进化特征,但它们共同代表了最具特征的核仁功能。
我们对核仁关联的蛋白质组学分类提供了核仁动态内容的新表示。这种核定位模型增加了覆盖范围,同时提供了核仁蛋白质组的准确和具体注释。它将有助于更好地理解核仁在细胞中的核心作用及其与其他亚细胞区室的相互作用。