将机器学习中的手工特征与潜在变量相结合，以预测放射性肺损伤。

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.

机构信息

Applied Physics Program, University of Michigan, Ann Arbor, MI, USA.

Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA.

出版信息

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

DOI:10.1002/mp.13497

PMID:30891794

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6510637/

Abstract

PURPOSE

There has been burgeoning interest in applying machine learning methods for predicting radiotherapy outcomes. However, the imbalanced ratio of a large number of variables to a limited sample size in radiation oncology constitutes a major challenge. Therefore, dimensionality reduction methods can be a key to success. The study investigates and contrasts the application of traditional machine learning methods and deep learning approaches for outcome modeling in radiotherapy. In particular, new joint architectures based on variational autoencoder (VAE) for dimensionality reduction are presented and their application is demonstrated for the prediction of lung radiation pneumonitis (RP) from a large-scale heterogeneous dataset.

METHODS

A large-scale heterogeneous dataset containing a pool of 230 variables including clinical factors (e.g., dose, KPS, stage) and biomarkers (e.g., single nucleotide polymorphisms (SNPs), cytokines, and micro-RNAs) in a population of 106 nonsmall cell lung cancer (NSCLC) patients who received radiotherapy was used for modeling RP. Twenty-two patients had grade 2 or higher RP. Four methods were investigated, including feature selection (case A) and feature extraction (case B) with traditional machine learning methods, a VAE-MLP joint architecture (case C) with deep learning and lastly, the combination of feature selection and joint architecture (case D). For feature selection, Random forest (RF), Support Vector Machine (SVM), and multilayer perceptron (MLP) were implemented to select relevant features. Specifically, each method was run for multiple times to rank features within several cross-validated (CV) resampled sets. A collection of ranking lists were then aggregated by top 5% and Kemeny graph methods to identify the final ranking for prediction. A synthetic minority oversampling technique was applied to correct for class imbalance during this process. For deep learning, a VAE-MLP joint architecture where a VAE aimed for dimensionality reduction and an MLP aimed for classification was developed. In this architecture, reconstruction loss and prediction loss were combined into a single loss function to realize simultaneous training and weights were assigned to different classes to mitigate class imbalance. To evaluate the prediction performance and conduct comparisons, the area under receiver operating characteristic curves (AUCs) were performed for nested CVs for both handcrafted feature selections and the deep learning approach. The significance of differences in AUCs was assessed using the DeLong test of U-statistics.

RESULTS

An MLP-based method using weight pruning (WP) feature selection yielded the best performance among the different hand-crafted feature selection methods (case A), reaching an AUC of 0.804 (95% CI: 0.761-0.823) with 29 top features. A VAE-MLP joint architecture (case C) achieved a comparable but slightly lower AUC of 0.781 (95% CI: 0.737-0.808) with the size of latent dimension being 2. The combination of handcrafted features (case A) and latent representation (case D) achieved a significant AUC improvement of 0.831 (95% CI: 0.805-0.863) with 22 features (P-value = 0.000642 compared with handcrafted features only (Case A) and P-value = 0.000453 compared to VAE alone (Case C)) with an MLP classifier.

CONCLUSION

The potential for combination of traditional machine learning methods and deep learning VAE techniques has been demonstrated for dealing with limited datasets in modeling radiotherapy toxicities. Specifically, latent variables from a VAE-MLP joint architecture are able to complement handcrafted features for the prediction of RP and improve prediction over either method alone.

摘要

目的

应用机器学习方法预测放疗结果的兴趣日益浓厚。然而，放射肿瘤学中大量变量与有限样本量之间的不平衡比例是一个主要挑战。因此，降维方法可以是成功的关键。本研究调查并对比了传统机器学习方法和深度学习方法在放疗结果建模中的应用。特别是，提出了基于变分自动编码器（VAE）的新联合架构，用于从大规模异质数据集预测肺放射性肺炎（RP）。

方法

使用包含 230 个变量的大型异质数据集，包括 106 名非小细胞肺癌（NSCLC）患者的临床因素（如剂量、KPS、分期）和生物标志物（如单核苷酸多态性（SNP）、细胞因子和 micro-RNAs），对 RP 进行建模。22 名患者出现 2 级或更高的 RP。研究了四种方法，包括特征选择（病例 A）和特征提取（病例 B）的传统机器学习方法、具有深度学习的 VAE-MLP 联合架构（病例 C）以及最后，特征选择和联合架构的组合（病例 D）。对于特征选择，实施了随机森林（RF）、支持向量机（SVM）和多层感知器（MLP）来选择相关特征。具体来说，每种方法都进行了多次运行，以在几个交叉验证（CV）重采样集中对特征进行排名。然后通过前 5%和 Kemeny 图方法对排名列表进行聚合，以确定最终的预测排名。在此过程中应用了合成少数群体过采样技术来纠正类别不平衡。对于深度学习，开发了一种 VAE-MLP 联合架构，其中 VAE 旨在降维，MLP 旨在分类。在该架构中，重建损失和预测损失被组合到单个损失函数中，以实现同时训练，并为不同的类别分配权重，以减轻类别不平衡。为了评估预测性能并进行比较，对基于手工艺品的特征选择和深度学习方法进行了嵌套 CV 的接收器操作特征曲线（AUC）的评估。使用 U 统计量的 DeLong 检验评估 AUC 差异的显著性。

结果

基于 MLP 的方法使用权重剪枝（WP）特征选择（病例 A）在不同的手工特征选择方法中表现最佳，达到 AUC 为 0.804（95%CI：0.761-0.823），具有 29 个顶级特征。VAE-MLP 联合架构（病例 C）达到了相当但略低的 AUC 为 0.781（95%CI：0.737-0.808），潜在维度大小为 2。手工特征（病例 A）和潜在表示（病例 D）的组合在具有 22 个特征时，AUC 有显著提高（0.831，95%CI：0.805-0.863）（与仅使用手工特征（病例 A）相比，P 值=0.000642，与单独使用 VAE（病例 C）相比，P 值=0.000453），使用 MLP 分类器。

结论

已经证明了传统机器学习方法和深度学习 VAE 技术的组合具有处理建模放疗毒性的有限数据集的潜力。具体来说，VAE-MLP 联合架构中的潜在变量能够补充手工特征，提高对 RP 的预测，并提高任何单一方法的预测能力。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

将机器学习中的手工特征与潜在变量相结合，以预测放射性肺损伤。

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

将机器学习中的手工特征与潜在变量相结合，以预测放射性肺损伤。

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献