Sigdel Madhav, Pusey Marc L, Aygun Ramazan S
Department of Computer Science, University of Alabama in Huntsville, Huntsville, USA.
iXpressGenes, Inc., 601 Genome Way, Huntsville, USA.
Cryst Growth Des. 2013 Jul 3;13(7):2728-2736. doi: 10.1021/cg3016029.
In this paper, we describe the design and implementation of a stand-alone real-time system for protein crystallization image acquisition and classification with a goal to assist crystallographers in scoring crystallization trials. In-house assembled fluorescence microscopy system is built for image acquisition. The images are classified into three categories as non-crystals, likely leads, and crystals. Image classification consists of two main steps - image feature extraction and application of classification based on multilayer perceptron (MLP) neural networks. Our feature extraction involves applying multiple thresholding techniques, identifying high intensity regions (blobs), and generating intensity and blob features to obtain a 45-dimensional feature vector per image. To reduce the risk of missing crystals, we introduce a max-class ensemble classifier which applies multiple classifiers and chooses the highest score (or class). We performed our experiments on 2250 images consisting 67% non-crystal, 18% likely leads, and 15% clear crystal images and tested our results using 10-fold cross validation. Our results demonstrate that the method is very efficient (< 3 seconds to process and classify an image) and has comparatively high accuracy. Our system only misses 1.2% of the crystals (classified as non-crystals) most likely due to low illumination or out of focus image capture and has an overall accuracy of 88%.
在本文中,我们描述了一个用于蛋白质结晶图像采集和分类的独立实时系统的设计与实现,目标是协助晶体学家对结晶试验进行评分。我们构建了内部组装的荧光显微镜系统用于图像采集。图像被分为非晶体、可能的先导物和晶体三类。图像分类包括两个主要步骤——图像特征提取和基于多层感知器(MLP)神经网络的分类应用。我们的特征提取包括应用多种阈值技术、识别高强度区域(斑点)以及生成强度和斑点特征,以获得每张图像的45维特征向量。为降低错过晶体的风险,我们引入了一种最大类集成分类器,它应用多个分类器并选择最高分(或类别)。我们在2250张图像上进行了实验,其中67%为非晶体图像、18%为可能的先导物图像、15%为清晰的晶体图像,并使用10折交叉验证测试了我们的结果。我们的结果表明,该方法非常高效(处理和分类一张图像不到3秒)且具有相对较高的准确率。我们的系统仅错过1.2%的晶体(被分类为非晶体),最有可能是由于光照不足或图像对焦不准,总体准确率为88%。