School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA.
Comput Biol Med. 2020 Sep;124:103959. doi: 10.1016/j.compbiomed.2020.103959. Epub 2020 Aug 6.
Radiomics is a newly emerging field that involves the extraction of massive quantitative features from biomedical images by using data-characterization algorithms. Distinctive imaging features identified from biomedical images can be used for prognosis and therapeutic response prediction, and they can provide a noninvasive approach for personalized therapy. So far, many of the published radiomics studies utilize existing out of the box algorithms to identify the prognostic markers from biomedical images that are not specific to radiomics data. To better utilize biomedical images, we propose a novel machine learning approach, stability selection supervised principal component analysis (SSSuperPCA) that identifies stable features from radiomics big data coupled with dimension reduction for right-censored survival outcomes. The proposed approach allows us to identify a set of stable features that are highly associated with the survival outcomes in a simple yet meaningful manner, while controlling the per-family error rate. We evaluate the performance of SSSuperPCA using simulations and real data sets for non-small cell lung cancer and head and neck cancer, and compare it with other machine learning algorithms. The results demonstrate that our method has a competitive edge over other existing methods in identifying the prognostic markers from biomedical imaging data for the prediction of right-censored survival outcomes.
放射组学是一个新兴领域,涉及通过数据特征化算法从生物医学图像中提取大量定量特征。从生物医学图像中识别出的独特成像特征可用于预后和治疗反应预测,并为个性化治疗提供一种非侵入性方法。到目前为止,许多已发表的放射组学研究利用现有的现成算法来识别与放射组学数据不相关的预后标志物。为了更好地利用生物医学图像,我们提出了一种新颖的机器学习方法,即稳定性选择监督主成分分析(SSSuperPCA),该方法可以从放射组学大数据中识别稳定特征,并结合降维来处理右删失生存数据。该方法允许我们以简单而有意义的方式识别一组与生存结果高度相关的稳定特征,同时控制每个家族的错误率。我们使用非小细胞肺癌和头颈部癌症的模拟数据和真实数据集来评估 SSSuperPCA 的性能,并将其与其他机器学习算法进行比较。结果表明,在识别生物医学成像数据中的预后标志物以预测右删失生存结果方面,我们的方法比其他现有方法具有竞争优势。