Chen Xiang, Velliste Meel, Murphy Robert F
Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Cytometry A. 2006 Jul;69(7):631-40. doi: 10.1002/cyto.a.20280.
Proteomics, the large scale identification and characterization of many or all proteins expressed in a given cell type, has become a major area of biological research. In addition to information on protein sequence, structure and expression levels, knowledge of a protein's subcellular location is essential to a complete understanding of its functions. Currently, subcellular location patterns are routinely determined by visual inspection of fluorescence microscope images. We review here research aimed at creating systems for automated, systematic determination of location. These employ numerical feature extraction from images, feature reduction to identify the most useful features, and various supervised learning (classification) and unsupervised learning (clustering) methods. These methods have been shown to perform significantly better than human interpretation of the same images. When coupled with technologies for tagging large numbers of proteins and high-throughput microscope systems, the computational methods reviewed here enable the new subfield of location proteomics. This subfield will make critical contributions in two related areas. First, it will provide structured, high-resolution information on location to enable Systems Biology efforts to simulate cell behavior from the gene level on up. Second, it will provide tools for Cytomics projects aimed at characterizing the behaviors of all cell types before, during, and after the onset of various diseases.
蛋白质组学是对特定细胞类型中表达的许多或所有蛋白质进行大规模鉴定和表征,已成为生物学研究的一个主要领域。除了蛋白质序列、结构和表达水平的信息外,了解蛋白质的亚细胞定位对于全面理解其功能至关重要。目前,亚细胞定位模式通常通过对荧光显微镜图像的目视检查来确定。我们在此回顾旨在创建自动、系统确定定位的系统的研究。这些系统采用从图像中提取数值特征、进行特征约简以识别最有用的特征,以及各种监督学习(分类)和无监督学习(聚类)方法。这些方法已被证明比人类对相同图像的解读表现得要好得多。当与标记大量蛋白质的技术和高通量显微镜系统相结合时,这里所回顾的计算方法促成了定位蛋白质组学这一新的子领域。该子领域将在两个相关领域做出重要贡献。首先,它将提供关于定位的结构化、高分辨率信息,以使系统生物学能够从基因层面开始模拟细胞行为。其次,它将为细胞组学项目提供工具,这些项目旨在表征各种疾病发作之前、期间和之后所有细胞类型的行为。