Suppr超能文献

一种用于高通量图像分析的细胞层质量控制工作流程。

A cell-level quality control workflow for high-throughput image analysis.

机构信息

Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, California, 92121, USA.

Shiyu Children Foundation, Room 1006-1008, Genesis Beijing, No. 8 Xinyuan South Road, Chaoyang District, Beijing, 100027, PR China.

出版信息

BMC Bioinformatics. 2020 Jul 2;21(1):280. doi: 10.1186/s12859-020-03603-5.

Abstract

BACKGROUND

Image-based high throughput (HT) screening provides a rich source of information on dynamic cellular response to external perturbations. The large quantity of data generated necessitates computer-aided quality control (QC) methodologies to flag imaging and staining artifacts. Existing image- or patch-level QC methods require separate thresholds to be simultaneously tuned for each image quality metric used, and also struggle to distinguish between artifacts and valid cellular phenotypes. As a result, extensive time and effort must be spent on per-assay QC feature thresholding, and valid images and phenotypes may be discarded while image- and cell-level artifacts go undetected.

RESULTS

We present a novel cell-level QC workflow built on machine learning approaches for classifying artifacts from HT image data. First, a phenotype sampler based on unlabeled clustering collects a comprehensive subset of cellular phenotypes, requiring only the inspection of a handful of images per phenotype for validity. A set of one-class support vector machines are then trained on each biologically valid image phenotype, and used to classify individual objects in each image as valid cells or artifacts. We apply this workflow to two real-world large-scale HT image datasets and observe that the ratio of artifact to total object area (AR) provides a single robust assessment of image quality regardless of the underlying causes of quality issues. Gating on this single intuitive metric, partially contaminated images can be salvaged and highly contaminated images can be excluded before image-level phenotype summary, enabling a more reliable characterization of cellular response dynamics.

CONCLUSIONS

Our cell-level QC workflow enables identification of artificial cells created not only by staining or imaging artifacts but also by the limitations of image segmentation algorithms. The single readout AR that summaries the ratio of artifacts contained in each image can be used to reliably rank images by quality and more accurately determine QC cutoff thresholds. Machine learning-based cellular phenotype clustering and sampling reduces the amount of manual work required for training example collection. Our QC workflow automatically handles assay-specific phenotypic variations and generalizes to different HT image assays.

摘要

背景

基于图像的高通量 (HT) 筛选为研究细胞对外界干扰的动态响应提供了丰富的信息来源。生成的大量数据需要计算机辅助质量控制 (QC) 方法来标记成像和染色伪影。现有的图像或斑块级 QC 方法需要同时为每个使用的图像质量指标调整单独的阈值,并且难以区分伪影和有效细胞表型。因此,必须花费大量时间和精力来调整每个检测的 QC 特征阈值,并且可能会丢弃有效图像和表型,而无法检测到图像和细胞级别的伪影。

结果

我们提出了一种新的基于机器学习的细胞级 QC 工作流程,用于从 HT 图像数据中分类伪影。首先,基于无标签聚类的表型采样器收集了全面的细胞表型子集,每个表型仅需要检查少数几张图像的有效性。然后,为每个生物学有效的图像表型训练一组单类支持向量机,并将其用于将每个图像中的单个对象分类为有效细胞或伪影。我们将此工作流程应用于两个真实的大规模 HT 图像数据集,观察到无论质量问题的根本原因如何,伪影与总目标区域 (AR) 的比率都可以提供单一可靠的图像质量评估。基于此单一直观指标进行门控,可以挽救部分污染的图像,并在进行图像级表型汇总之前排除高度污染的图像,从而更可靠地描述细胞响应动态。

结论

我们的细胞级 QC 工作流程能够识别不仅由染色或成像伪影,而且由图像分割算法的局限性创建的人工细胞。包含在每个图像中的伪影的单个读数 AR 可用于可靠地按质量对图像进行排序,并更准确地确定 QC 截止阈值。基于机器学习的细胞表型聚类和采样减少了收集训练示例所需的手动工作量。我们的 QC 工作流程自动处理特定于检测的表型变化,并推广到不同的 HT 图像检测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b9/7333376/122ea0f1bc71/12859_2020_3603_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验