Suppr超能文献

高通量蛋白质结晶试验中的自动分类与模式发现

Automatic classification and pattern discovery in high-throughput protein crystallization trials.

作者信息

Cumbaa Christian, Jurisica Igor

机构信息

Ontario Cancer Institute, Northeast Structural Genomics Consortium, 610 University Avenue, Toronto, Ontario M5G 2M9, Canada.

出版信息

J Struct Funct Genomics. 2005;6(2-3):195-202. doi: 10.1007/s10969-005-5243-9.

Abstract

Conceptually, protein crystallization can be divided into two phases search and optimization. Robotic protein crystallization screening can speed up the search phase, and has a potential to increase process quality. Automated image classification helps to increase throughput and consistently generate objective results. Although the classification accuracy can always be improved, our image analysis system can classify images from 1,536-well plates with high classification accuracy (85%) and ROC score (0.87), as evaluated on 127 human-classified protein screens containing 5,600 crystal images and 189,472 non-crystal images. Data mining can integrate results from high-throughput screens with information about crystallizing conditions, intrinsic protein properties, and results from crystallization optimization. We apply association mining, a data mining approach that identifies frequently occurring patterns among variables and their values. This approach segregates proteins into groups based on how they react in a broad range of conditions, and clusters cocktails to reflect their potential to achieve crystallization. These results may lead to crystallization screen optimization, and reveal associations between protein properties and crystallization conditions. We also postulate that past experience may lead us to the identification of initial conditions favorable to crystallization for novel proteins.

摘要

从概念上讲,蛋白质结晶可分为两个阶段:搜索阶段和优化阶段。机器人蛋白质结晶筛选可以加快搜索阶段的速度,并有可能提高过程质量。自动图像分类有助于提高通量并持续产生客观结果。尽管分类准确率总是可以提高,但我们的图像分析系统能够以较高的分类准确率(85%)和ROC评分(0.87)对1536孔板中的图像进行分类,这是在对127个人工分类的蛋白质筛选进行评估得出的结果,这些筛选包含5600张晶体图像和189472张非晶体图像。数据挖掘可以将高通量筛选的结果与有关结晶条件、蛋白质固有特性以及结晶优化结果的信息整合起来。我们应用关联挖掘,这是一种数据挖掘方法,可识别变量及其值之间频繁出现的模式。这种方法根据蛋白质在广泛条件下的反应方式将蛋白质分为不同组,并对结晶剂进行聚类以反映它们实现结晶的潜力。这些结果可能会导致结晶筛选的优化,并揭示蛋白质特性与结晶条件之间的关联。我们还推测,过去的经验可能会引导我们确定有利于新型蛋白质结晶的初始条件。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验