用于高内涵筛选表型分析的主动学习策略

Active Learning Strategies for Phenotypic Profiling of High-Content Screens.

作者信息

Smith Kevin, Horvath Peter

机构信息

Light Microscopy and Screening Centre, ETH Zurich, Switzerland.

Institute of Biochemistry, ETH Zurich, Switzerland Synthetic and Systems Biology Unit, Biological Research Center, Szeged, Hungary

出版信息

J Biomol Screen. 2014 Jun;19(5):685-95. doi: 10.1177/1087057114527313. Epub 2014 Mar 18.

DOI:10.1177/1087057114527313

PMID:24643256

Abstract

High-content screening is a powerful method to discover new drugs and carry out basic biological research. Increasingly, high-content screens have come to rely on supervised machine learning (SML) to perform automatic phenotypic classification as an essential step of the analysis. However, this comes at a cost, namely, the labeled examples required to train the predictive model. Classification performance increases with the number of labeled examples, and because labeling examples demands time from an expert, the training process represents a significant time investment. Active learning strategies attempt to overcome this bottleneck by presenting the most relevant examples to the annotator, thereby achieving high accuracy while minimizing the cost of obtaining labeled data. In this article, we investigate the impact of active learning on single-cell-based phenotype recognition, using data from three large-scale RNA interference high-content screens representing diverse phenotypic profiling problems. We consider several combinations of active learning strategies and popular SML methods. Our results show that active learning significantly reduces the time cost and can be used to reveal the same phenotypic targets identified using SML. We also identify combinations of active learning strategies and SML methods which perform better than others on the phenotypic profiling problems we studied.

摘要

高内涵筛选是发现新药和开展基础生物学研究的一种强大方法。越来越多的高内涵筛选开始依赖监督式机器学习（SML）来执行自动表型分类，将其作为分析的一个关键步骤。然而，这是有代价的，即训练预测模型所需的标记示例。分类性能会随着标记示例数量的增加而提高，而且由于标记示例需要专家花费时间，训练过程意味着大量的时间投入。主动学习策略试图通过向注释者呈现最相关的示例来克服这一瓶颈，从而在将获取标记数据的成本降至最低的同时实现高精度。在本文中，我们利用来自三个大规模RNA干扰高内涵筛选的数据，研究主动学习对基于单细胞的表型识别的影响，这些筛选代表了不同的表型分析问题。我们考虑了主动学习策略和流行的SML方法的几种组合。我们的结果表明，主动学习显著降低了时间成本，并且可用于揭示使用SML识别出的相同表型靶点。我们还确定了在我们研究的表型分析问题上比其他组合表现更好的主动学习策略和SML方法的组合。

相似文献

Active Learning Strategies for Phenotypic Profiling of High-Content Screens.用于高内涵筛选表型分析的主动学习策略

J Biomol Screen. 2014 Jun;19(5):685-95. doi: 10.1177/1087057114527313. Epub 2014 Mar 18.

Machine learning improves the precision and robustness of high-content screens: using nonlinear multiparametric methods to analyze screening results.机器学习提高了高内涵筛选的精度和稳健性：使用非线性多参数方法分析筛选结果。

J Biomol Screen. 2011 Oct;16(9):1059-67. doi: 10.1177/1087057111414878. Epub 2011 Aug 1.

Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening.结合监督和无监督机器学习方法进行表型功能基因组学筛选。

SLAS Discov. 2020 Jul;25(6):655-664. doi: 10.1177/2472555220919345. Epub 2020 May 13.

Large-scale tracking and classification for automatic analysis of cell migration and proliferation, and experimental optimization of high-throughput screens of neuroblastoma cells.用于神经母细胞瘤细胞迁移和增殖自动分析的大规模跟踪与分类，以及高通量筛选的实验优化。

Cytometry A. 2015 Jun;87(6):524-40. doi: 10.1002/cyto.a.22632. Epub 2015 Jan 28.

Digging deep into Golgi phenotypic diversity with unsupervised machine learning.利用无监督机器学习深入研究高尔基体表型多样性。

Mol Biol Cell. 2017 Dec 1;28(25):3686-3698. doi: 10.1091/mbc.E17-06-0379. Epub 2017 Oct 11.

Using information from historical high-throughput screens to predict active compounds.利用历史高通量筛选信息预测活性化合物。

J Chem Inf Model. 2014 Jul 28;54(7):1880-91. doi: 10.1021/ci500190p. Epub 2014 Jun 26.

SemiBoost: boosting for semi-supervised learning.半增强算法：用于半监督学习的增强算法

IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2000-14. doi: 10.1109/TPAMI.2008.235.

Advanced Cell Classifier: User-Friendly Machine-Learning-Based Software for Discovering Phenotypes in High-Content Imaging Data.高级细胞分类器：用户友好的基于机器学习的软件，用于发现高内涵成像数据中的表型。

Cell Syst. 2017 Jun 28;4(6):651-655.e5. doi: 10.1016/j.cels.2017.05.012. Epub 2017 Jun 21.

Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens.在高通量RNA干扰筛选的背景下，使用带有改进间隙统计量的迭代聚类合并来进行在线表型发现。

BMC Bioinformatics. 2008 Jun 5;9:264. doi: 10.1186/1471-2105-9-264.

Data-analysis strategies for image-based cell profiling.基于图像的细胞分析中的数据分析策略。

Nat Methods. 2017 Aug 31;14(9):849-863. doi: 10.1038/nmeth.4397.

引用本文的文献

Contributions of deep learning to automated numerical modelling of the interaction of electric fields and cartilage tissue based on 3D images.深度学习对基于3D图像的电场与软骨组织相互作用的自动数值建模的贡献。

Front Bioeng Biotechnol. 2023 Aug 29;11:1225495. doi: 10.3389/fbioe.2023.1225495. eCollection 2023.

Image-based and machine learning-guided multiplexed serology test for SARS-CoV-2.基于图像和机器学习指导的 SARS-CoV-2 多重血清学检测

Cell Rep Methods. 2023 Aug 22;3(8):100565. doi: 10.1016/j.crmeth.2023.100565. eCollection 2023 Aug 28.

Unleashing high content screening in hit detection - Benchmarking AI workflows including novelty detection.在命中物检测中释放高内涵筛选——对包括新颖性检测在内的人工智能工作流程进行基准测试。

Comput Struct Biotechnol J. 2022 Sep 27;20:5453-5465. doi: 10.1016/j.csbj.2022.09.023. eCollection 2022.

Deep Visual Proteomics defines single-cell identity and heterogeneity.深度视觉蛋白质组学定义了单细胞的身份和异质性。

Nat Biotechnol. 2022 Aug;40(8):1231-1240. doi: 10.1038/s41587-022-01302-5. Epub 2022 May 19.

Regression plane concept for analysing continuous cellular processes with machine learning.回归平面概念在机器学习中用于分析连续的细胞过程。

Nat Commun. 2021 May 5;12(1):2532. doi: 10.1038/s41467-021-22866-x.

Accelerated knowledge discovery from omics data by optimal experimental design.通过实验设计优化加速组学数据的知识发现。

Nat Commun. 2020 Oct 6;11(1):5026. doi: 10.1038/s41467-020-18785-y.

Interactive machine learning for fast and robust cell profiling.交互式机器学习可实现快速而稳健的细胞分析。

PLoS One. 2020 Sep 11;15(9):e0237972. doi: 10.1371/journal.pone.0237972. eCollection 2020.

Environmental properties of cells improve machine learning-based phenotype recognition accuracy.细胞的环境特性提高了基于机器学习的表型识别准确性。

Sci Rep. 2018 Jul 4;8(1):10085. doi: 10.1038/s41598-018-28482-y.

Machine learning applications in cell image analysis.机器学习在细胞图像分析中的应用。

Immunol Cell Biol. 2017 Jul;95(6):525-530. doi: 10.1038/icb.2017.16. Epub 2017 Mar 15.

A Critical and Comparative Review of Fluorescent Tools for Live-Cell Imaging.活细胞成像荧光工具的关键与比较性评价

Annu Rev Physiol. 2017 Feb 10;79:93-117. doi: 10.1146/annurev-physiol-022516-034055. Epub 2016 Nov 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于高内涵筛选表型分析的主动学习策略

Active Learning Strategies for Phenotypic Profiling of High-Content Screens.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献