Suppr超能文献

用于高内涵筛选表型分析的主动学习策略

Active Learning Strategies for Phenotypic Profiling of High-Content Screens.

作者信息

Smith Kevin, Horvath Peter

机构信息

Light Microscopy and Screening Centre, ETH Zurich, Switzerland.

Institute of Biochemistry, ETH Zurich, Switzerland Synthetic and Systems Biology Unit, Biological Research Center, Szeged, Hungary

出版信息

J Biomol Screen. 2014 Jun;19(5):685-95. doi: 10.1177/1087057114527313. Epub 2014 Mar 18.

Abstract

High-content screening is a powerful method to discover new drugs and carry out basic biological research. Increasingly, high-content screens have come to rely on supervised machine learning (SML) to perform automatic phenotypic classification as an essential step of the analysis. However, this comes at a cost, namely, the labeled examples required to train the predictive model. Classification performance increases with the number of labeled examples, and because labeling examples demands time from an expert, the training process represents a significant time investment. Active learning strategies attempt to overcome this bottleneck by presenting the most relevant examples to the annotator, thereby achieving high accuracy while minimizing the cost of obtaining labeled data. In this article, we investigate the impact of active learning on single-cell-based phenotype recognition, using data from three large-scale RNA interference high-content screens representing diverse phenotypic profiling problems. We consider several combinations of active learning strategies and popular SML methods. Our results show that active learning significantly reduces the time cost and can be used to reveal the same phenotypic targets identified using SML. We also identify combinations of active learning strategies and SML methods which perform better than others on the phenotypic profiling problems we studied.

摘要

高内涵筛选是发现新药和开展基础生物学研究的一种强大方法。越来越多的高内涵筛选开始依赖监督式机器学习(SML)来执行自动表型分类,将其作为分析的一个关键步骤。然而,这是有代价的,即训练预测模型所需的标记示例。分类性能会随着标记示例数量的增加而提高,而且由于标记示例需要专家花费时间,训练过程意味着大量的时间投入。主动学习策略试图通过向注释者呈现最相关的示例来克服这一瓶颈,从而在将获取标记数据的成本降至最低的同时实现高精度。在本文中,我们利用来自三个大规模RNA干扰高内涵筛选的数据,研究主动学习对基于单细胞的表型识别的影响,这些筛选代表了不同的表型分析问题。我们考虑了主动学习策略和流行的SML方法的几种组合。我们的结果表明,主动学习显著降低了时间成本,并且可用于揭示使用SML识别出的相同表型靶点。我们还确定了在我们研究的表型分析问题上比其他组合表现更好的主动学习策略和SML方法的组合。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验