Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Bioinformatics. 2010 Jun 15;26(12):i7-12. doi: 10.1093/bioinformatics/btq220.
Proteins exhibit complex subcellular distributions, which may include localizing in more than one organelle and varying in location depending on the cell physiology. Estimating the amount of protein distributed in each subcellular location is essential for quantitative understanding and modeling of protein dynamics and how they affect cell behaviors. We have previously described automated methods using fluorescent microscope images to determine the fractions of protein fluorescence in various subcellular locations when the basic locations in which a protein can be present are known. As this set of basic locations may be unknown (especially for studies on a proteome-wide scale), we here describe unsupervised methods to identify the fundamental patterns from images of mixed patterns and estimate the fractional composition of them.
We developed two approaches to the problem, both based on identifying types of objects present in images and representing patterns by frequencies of those object types. One is a basis pursuit method (which is based on a linear mixture model), and the other is based on latent Dirichlet allocation (LDA). For testing both approaches, we used images previously acquired for testing supervised unmixing methods. These images were of cells labeled with various combinations of two organelle-specific probes that had the same fluorescent properties to simulate mixed patterns of subcellular location.
We achieved 0.80 and 0.91 correlation between estimated and underlying fractions of the two probes (fundamental patterns) with basis pursuit and LDA approaches, respectively, indicating that our methods can unmix the complex subcellular distribution with reasonably high accuracy.
蛋白质表现出复杂的亚细胞分布,这可能包括定位于一个以上的细胞器,并且根据细胞生理学的不同而在位置上有所变化。估计蛋白质在每个亚细胞位置的分布量对于定量理解和建模蛋白质动力学以及它们如何影响细胞行为是至关重要的。我们之前描述了使用荧光显微镜图像自动确定蛋白质荧光在各种亚细胞位置中的分数的方法,前提是已知蛋白质可以存在的基本位置。由于这组基本位置可能未知(特别是在对蛋白质组范围的研究中),我们在这里描述了从混合模式的图像中识别基本模式并估计它们的分数组成的无监督方法。
我们针对这个问题开发了两种方法,都基于识别图像中存在的对象类型,并通过这些对象类型的频率来表示模式。一种是基于基础的追求方法(它基于线性混合模型),另一种是基于潜在狄利克雷分配(LDA)。为了测试这两种方法,我们使用了以前为测试有监督解混方法而获取的图像。这些图像是用两种细胞器特异性探针标记的细胞,它们具有相同的荧光特性,以模拟亚细胞位置的混合模式。
我们分别使用基础追求方法和 LDA 方法实现了 0.80 和 0.91 的相关系数,表明我们的方法可以以相当高的精度解混复杂的亚细胞分布。