IEEE Trans Med Imaging. 2019 Apr;38(4):1037-1047. doi: 10.1109/TMI.2018.2877080. Epub 2018 Oct 22.
The identification and quantification of markers in medical images is critical for diagnosis, prognosis, and disease management. Supervised machine learning enables the detection and exploitation of findings that are known a priori after annotation of training examples by experts. However, supervision does not scale well, due to the amount of necessary training examples, and the limitation of the marker vocabulary to known entities. In this proof-of-concept study, we propose unsupervised identification of anomalies as candidates for markers in retinal optical coherence tomography (OCT) imaging data without a constraint to a priori definitions. We identify and categorize marker candidates occurring frequently in the data and demonstrate that these markers show a predictive value in the task of detecting disease. A careful qualitative analysis of the identified data driven markers reveals how their quantifiable occurrence aligns with our current understanding of disease course, in early- and late age-related macular degeneration (AMD) patients. A multi-scale deep denoising autoencoder is trained on healthy images, and a one-class support vector machine identifies anomalies in new data. Clustering in the anomalies identifies stable categories. Using these markers to classify healthy-, early AMD- and late AMD cases yields an accuracy of 81.40%. In a second binary classification experiment on a publicly available data set (healthy versus intermediate AMD), the model achieves an area under the ROC curve of 0.944.
医学图像中标志物的识别和定量对于诊断、预后和疾病管理至关重要。监督机器学习能够在专家对训练样例进行注释后,检测和利用已知的发现。然而,由于所需训练样例的数量以及标记词汇对已知实体的限制,监督并不具有良好的扩展性。在这项概念验证研究中,我们提出了一种无监督的方法,可以在不限制先验定义的情况下,从视网膜光学相干断层扫描(OCT)成像数据中识别异常作为候选标志物。我们识别并分类了在数据中经常出现的候选标志物,并证明这些标志物在检测疾病的任务中具有预测价值。对识别出的基于数据的标记物进行仔细的定性分析表明,它们可量化的出现与我们对早期和晚期年龄相关性黄斑变性(AMD)患者疾病过程的现有理解是一致的。在健康图像上训练多尺度深度去噪自动编码器,然后使用单类支持向量机识别新数据中的异常。在异常中进行聚类可以识别稳定的类别。使用这些标记物对健康、早期 AMD 和晚期 AMD 病例进行分类,准确率为 81.40%。在对公开数据集(健康与中期 AMD)进行的第二项二分类实验中,该模型的 ROC 曲线下面积达到 0.944。