Suppr超能文献

快速自动细胞表型图像分类

Fast automated cell phenotype image classification.

作者信息

Hamilton Nicholas A, Pantelic Radosav S, Hanson Kelly, Teasdale Rohan D

机构信息

ARC Centre in Bioinformatics, University of Queensland, Brisbane, Queensland 4072, Australia.

出版信息

BMC Bioinformatics. 2007 Mar 30;8:110. doi: 10.1186/1471-2105-8-110.

Abstract

BACKGROUND

The genomic revolution has led to rapid growth in sequencing of genes and proteins, and attention is now turning to the function of the encoded proteins. In this respect, microscope imaging of a protein's sub-cellular localisation is proving invaluable, and recent advances in automated fluorescent microscopy allow protein localisations to be imaged in high throughput. Hence there is a need for large scale automated computational techniques to efficiently quantify, distinguish and classify sub-cellular images. While image statistics have proved highly successful in distinguishing localisation, commonly used measures suffer from being relatively slow to compute, and often require cells to be individually selected from experimental images, thus limiting both throughput and the range of potential applications. Here we introduce threshold adjacency statistics, the essence which is to threshold the image and to count the number of above threshold pixels with a given number of above threshold pixels adjacent. These novel measures are shown to distinguish and classify images of distinct sub-cellular localization with high speed and accuracy without image cropping.

RESULTS

Threshold adjacency statistics are applied to classification of protein sub-cellular localization images. They are tested on two image sets (available for download), one for which fluorescently tagged proteins are endogenously expressed in 10 sub-cellular locations, and another for which proteins are transfected into 11 locations. For each image set, a support vector machine was trained and tested. Classification accuracies of 94.4% and 86.6% are obtained on the endogenous and transfected sets, respectively. Threshold adjacency statistics are found to provide comparable or higher accuracy than other commonly used statistics while being an order of magnitude faster to calculate. Further, threshold adjacency statistics in combination with Haralick measures give accuracies of 98.2% and 93.2% on the endogenous and transfected sets, respectively.

CONCLUSION

Threshold adjacency statistics have the potential to greatly extend the scale and range of applications of image statistics in computational image analysis. They remove the need for cropping of individual cells from images, and are an order of magnitude faster to calculate than other commonly used statistics while providing comparable or better classification accuracy, both essential requirements for application to large-scale approaches.

摘要

背景

基因组革命推动了基因和蛋白质测序的快速发展,目前人们的注意力正转向所编码蛋白质的功能。在这方面,蛋白质亚细胞定位的显微镜成像已证明具有极高价值,并且自动荧光显微镜技术的最新进展使得能够高通量地对蛋白质定位进行成像。因此,需要大规模的自动化计算技术来有效地量化、区分和分类亚细胞图像。虽然图像统计在区分定位方面已证明非常成功,但常用的测量方法存在计算相对较慢的问题,并且通常需要从实验图像中逐个选择细胞,从而限制了通量和潜在应用范围。在此,我们引入阈值邻接统计,其本质是对图像进行阈值处理,并计算具有给定数量相邻高于阈值像素的高于阈值像素的数量。这些新颖的测量方法被证明能够在不裁剪图像的情况下,高速且准确地区分和分类不同亚细胞定位的图像。

结果

阈值邻接统计应用于蛋白质亚细胞定位图像的分类。它们在两个图像集(可下载)上进行了测试,一个图像集中荧光标记的蛋白质在10个亚细胞位置内源性表达,另一个图像集中蛋白质被转染到11个位置。对于每个图像集,训练并测试了支持向量机。在内源性和转染图像集上分别获得了94.4%和86.6%的分类准确率。发现阈值邻接统计比其他常用统计方法提供了相当或更高的准确率,同时计算速度快一个数量级。此外,阈值邻接统计与哈拉里克测量方法相结合,在内源性和转染图像集上分别给出了98.2%和93.2%的准确率。

结论

阈值邻接统计有潜力极大地扩展图像统计在计算图像分析中的应用规模和范围。它们无需从图像中裁剪单个细胞,并且比其他常用统计方法计算速度快一个数量级,同时提供相当或更好的分类准确率,这两者都是应用于大规模方法的基本要求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e006/1847687/90e1eb17829a/1471-2105-8-110-1.jpg

相似文献

1
Fast automated cell phenotype image classification.
BMC Bioinformatics. 2007 Mar 30;8:110. doi: 10.1186/1471-2105-8-110.
2
Automated protein subcellular localization based on local invariant features.
Protein J. 2013 Mar;32(3):230-7. doi: 10.1007/s10930-013-9478-1.
3
Visualizing and clustering high throughput sub-cellular localization imaging.
BMC Bioinformatics. 2008 Feb 4;9:81. doi: 10.1186/1471-2105-9-81.
4
Efficient computational model for classification of protein localization images using Extended Threshold Adjacency Statistics and Support Vector Machines.
Comput Methods Programs Biomed. 2018 Apr;157:205-215. doi: 10.1016/j.cmpb.2018.01.021. Epub 2018 Feb 2.
5
Statistical and visual differentiation of subcellular imaging.
BMC Bioinformatics. 2009 Mar 22;10:94. doi: 10.1186/1471-2105-10-94.
6
A reliable method for cell phenotype image classification.
Artif Intell Med. 2008 Jun;43(2):87-97. doi: 10.1016/j.artmed.2008.03.005. Epub 2008 Apr 28.
7
Phenotype recognition with combined features and random subspace classifier ensemble.
BMC Bioinformatics. 2011 Apr 30;12:128. doi: 10.1186/1471-2105-12-128.
8
Empirical gradient threshold technique for automated segmentation across image modalities and cell lines.
J Microsc. 2015 Oct;260(1):86-99. doi: 10.1111/jmi.12269. Epub 2015 Jun 5.
9
An incremental approach to automated protein localisation.
BMC Bioinformatics. 2008 Oct 20;9:445. doi: 10.1186/1471-2105-9-445.

引用本文的文献

2
Multiscale chromatin dynamics and high entropy in plant iPSC ancestors.
J Cell Sci. 2024 Oct 15;137(20). doi: 10.1242/jcs.261703. Epub 2024 Jun 24.
4
Advanced disk herniation computer aided diagnosis system.
Sci Rep. 2024 Apr 5;14(1):8071. doi: 10.1038/s41598-024-58283-5.
5
Imaging-based chromatin and epigenetic age, ImAge, quantitates aging and rejuvenation.
Res Sq. 2023 Nov 7:rs.3.rs-3479973. doi: 10.21203/rs.3.rs-3479973/v1.
6
Breast Tumor Tissue Image Classification Using DIU-Net.
Sensors (Basel). 2022 Dec 14;22(24):9838. doi: 10.3390/s22249838.
8
Scaling multi-instance support vector machine to breast cancer detection on the BreaKHis dataset.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i92-i100. doi: 10.1093/bioinformatics/btac267.
9
A Computer Vision-Based Approach for Tick Identification Using Deep Learning Models.
Insects. 2022 Jan 22;13(2):116. doi: 10.3390/insects13020116.
10
Identification of phenotype-specific networks from paired gene expression-cell shape imaging data.
Genome Res. 2022 Apr;32(4):750-765. doi: 10.1101/gr.276059.121. Epub 2022 Feb 23.

本文引用的文献

1
Putting proteins on the map.
Nat Biotechnol. 2006 Oct;24(10):1223-4. doi: 10.1038/nbt1006-1223.
2
Cellular imaging in drug discovery.
Nat Rev Drug Discov. 2006 Apr;5(4):343-56. doi: 10.1038/nrd2008.
3
Bias in error estimation when using cross-validation for model selection.
BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.
5
LOCATE: a mouse protein subcellular localization database.
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D213-7. doi: 10.1093/nar/gkj069.
6
Objective clustering of proteins based on subcellular location patterns.
J Biomed Biotechnol. 2005 Jun 30;2005(2):87-95. doi: 10.1155/JBB.2005.87.
7
From quantitative microscopy to automated image understanding.
J Biomed Opt. 2004 Sep-Oct;9(5):893-912. doi: 10.1117/1.1779233.
8
Automatic identification of subcellular phenotypes on human cell arrays.
Genome Res. 2004 Jun;14(6):1130-6. doi: 10.1101/gr.2383804.
10
Global analysis of protein localization in budding yeast.
Nature. 2003 Oct 16;425(6959):686-91. doi: 10.1038/nature02026.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验