Liu Roy, Freund Yoav, Spraggon Glen
University of California at San Diego, USA.
Acta Crystallogr D Biol Crystallogr. 2008 Dec;64(Pt 12):1187-95. doi: 10.1107/S090744490802982X. Epub 2008 Nov 18.
The ability of computers to learn from and annotate large databases of crystallization-trial images provides not only the ability to reduce the workload of crystallization studies, but also an opportunity to annotate crystallization trials as part of a framework for improving screening methods. Here, a system is presented that scores sets of images based on the likelihood of containing crystalline material as perceived by a machine-learning algorithm. The system can be incorporated into existing crystallization-analysis pipelines, whereby specialists examine images as they normally would with the exception that the images appear in rank order according to a simple real-valued score. Promising results are shown for 319 112 images associated with 150 structures solved by the Joint Center for Structural Genomics pipeline during the 2006-2007 year. Overall, the algorithm achieves a mean receiver operating characteristic score of 0.919 and a 78% reduction in human effort per set when considering an absolute score cutoff for screening images, while incurring a loss of five out of 150 structures.
计算机从大量结晶试验图像数据库中学习并进行注释的能力,不仅提供了减少结晶研究工作量的能力,还提供了一个机会,可将结晶试验注释作为改进筛选方法框架的一部分。在此,提出了一种系统,该系统根据机器学习算法所感知的包含晶体材料的可能性对图像集进行评分。该系统可纳入现有的结晶分析流程中,在这种流程中,专家们像往常一样检查图像,不同的是图像会根据一个简单的实值分数按排名顺序呈现。对于联合结构基因组学中心流程在2006 - 2007年期间解析的150个结构所关联的319112张图像,展示了有前景的结果。总体而言,当考虑用于筛选图像的绝对分数阈值时,该算法的平均受试者操作特征评分为0.919,每组图像所需人力减少78%,同时在150个结构中有5个结构丢失。