Sundstrom Andrew, Cirrone Silvio, Paxia Salvatore, Hsueh Carlin, Kjolby Rachel, Gimzewski James K, Reed Jason, Mishra Bud
IEEE Trans Inf Technol Biomed. 2012 Nov;16(6):1200-7. doi: 10.1109/TITB.2012.2206819. Epub 2012 Jun 29.
There are many examples of problems in pattern analysis for which it is often possible to obtain systematic characterizations, if in addition a small number of useful features or parameters of the image are known a priori or can be estimated reasonably well. Often the relevant features of a particular pattern analysis problem are easy to enumerate, as when statistical structures of the patterns are well understood from the knowledge of the domain. We study a problem from molecular image analysis, where such a domain-dependent understanding may be lacking to some degree and the features must be inferred via machine-learning techniques. In this paper, we propose a rigorous, fully-automated technique for this problem. We are motivated by an application of atomic force microscopy (AFM) image processing needed to solve a central problem in molecular biology, aimed at obtaining the complete transcription profile of a single cell, a snapshot that shows which genes are being expressed and to what degree. Reed et al (Single molecule transcription profiling with AFM, Nanotechnology, 18:4, 2007) showed the transcription profiling problem reduces to making high-precision measurements of biomolecule backbone lengths, correct to within 20-25 bp (6-7.5 nm). Here we present an image processing and length estimation pipeline using AFM that comes close to achieving these measurement tolerances. In particular, we develop a biased length estimator on trained coefficients of a simple linear regression model, biweighted by a Beaton-Tukey function, whose feature universe is constrained by James-Stein shrinkage to avoid overfitting. In terms of extensibility and addressing the model selection problem, this formulation subsumes the models we studied.
在模式分析中,有许多问题的例子,如果事先知道少量有用的图像特征或参数,或者能够相当准确地估计这些特征或参数,通常就有可能获得系统的特征描述。通常,特定模式分析问题的相关特征很容易列举出来,比如当从领域知识中对模式的统计结构有很好的理解时。我们研究一个来自分子图像分析的问题,在这个问题中,某种程度上可能缺乏这种依赖领域的理解,并且必须通过机器学习技术来推断特征。在本文中,我们针对这个问题提出了一种严格的、全自动的技术。我们的动机来自于原子力显微镜(AFM)图像处理的一个应用,该应用需要解决分子生物学中的一个核心问题,即获得单个细胞的完整转录谱,这是一个显示哪些基因正在表达以及表达程度的快照。里德等人(《利用AFM进行单分子转录谱分析》,《纳米技术》,2007年第18卷第4期)表明,转录谱分析问题归结为对生物分子主链长度进行高精度测量,精确到20 - 25个碱基对(6 - 7.5纳米)以内。在这里,我们展示了一个使用AFM的图像处理和长度估计流程,该流程接近实现这些测量公差。特别是,我们基于一个简单线性回归模型的训练系数开发了一个有偏长度估计器,由比顿 - 图基函数进行双加权,其特征空间通过詹姆斯 - 斯坦收缩进行约束以避免过拟合。在可扩展性和解决模型选择问题方面,这种表述包含了我们研究的模型。