Martinez-Murcia Francisco J, Górriz Juan M, Ramírez Javier, Illán Ignacio A, Segovia Fermín, Castillo-Barnes Diego, Salas-Gonzalez Diego
Signal Processing and Biomedical Application, Department of Signal Theory, Networking and Communication, University of Granada, Granada, Spain.
Department of Scientific Computing, Florida State University, Tallahassee, FL, United States.
Front Neuroinform. 2017 Nov 14;11:65. doi: 10.3389/fninf.2017.00065. eCollection 2017.
The rise of neuroimaging in research and clinical practice, together with the development of new machine learning techniques has strongly encouraged the Computer Aided Diagnosis (CAD) of different diseases and disorders. However, these algorithms are often tested in proprietary datasets to which the access is limited and, therefore, a direct comparison between CAD procedures is not possible. Furthermore, the sample size is often small for developing accurate machine learning methods. Multi-center initiatives are currently a very useful, although limited, tool in the recruitment of large populations and standardization of CAD evaluation. Conversely, we propose a brain image synthesis procedure intended to generate a new image set that share characteristics with an original one. Our system focuses on nuclear imaging modalities such as PET or SPECT brain images. We analyze the dataset by applying PCA to the original dataset, and then model the distribution of samples in the projected eigenbrain space using a Probability Density Function (PDF) estimator. Once the model has been built, we can generate new coordinates on the eigenbrain space belonging to the same class, which can be then projected back to the image space. The system has been evaluated on different functional neuroimaging datasets assessing the: resemblance of the synthetic images with the original ones, the differences between them, their generalization ability and the independence of the synthetic dataset with respect to the original. The synthetic images maintain the differences between groups found at the original dataset, with no significant differences when comparing them to real-world samples. Furthermore, they featured a similar performance and generalization capability to that of the original dataset. These results prove that these images are suitable for standardizing the evaluation of CAD pipelines, and providing data augmentation in machine learning systems -e.g. in deep learning-, or even to train future professionals at medical school.
神经影像学在研究和临床实践中的兴起,以及新机器学习技术的发展,极大地推动了不同疾病和病症的计算机辅助诊断(CAD)。然而,这些算法通常在访问受限的专有数据集中进行测试,因此无法直接比较CAD程序。此外,开发准确的机器学习方法时样本量往往较小。多中心倡议目前是招募大量人群和实现CAD评估标准化的非常有用但有限的工具。相反,我们提出了一种脑图像合成程序,旨在生成一组与原始图像具有共同特征的新图像。我们的系统专注于PET或SPECT脑图像等核成像模态。我们通过对原始数据集应用主成分分析(PCA)来分析数据集,然后使用概率密度函数(PDF)估计器对投影特征脑空间中的样本分布进行建模。一旦建立了模型,我们就可以在属于同一类别的特征脑空间上生成新的坐标,然后将其投影回图像空间。该系统已在不同的功能神经影像数据集上进行评估,评估内容包括:合成图像与原始图像的相似性、它们之间的差异、它们的泛化能力以及合成数据集相对于原始数据集的独立性。合成图像保留了原始数据集中发现的组间差异,与真实世界样本相比没有显著差异。此外,它们具有与原始数据集相似的性能和泛化能力。这些结果证明,这些图像适用于标准化CAD管道的评估,并在机器学习系统(例如深度学习)中提供数据增强,甚至用于培训医学院的未来专业人员。