Bibault Jean-Emmanuel, Bassenne Maxime, Ren Hongyi, Xing Lei
Laboratory of Artificial Intelligence in Medicine and Biomedical Physics, Stanford University School of Medicine, Stanford, CA 94304, USA.
Radiation Oncology Department, Hôpital Européen Georges Pompidou, Assistance Publique-Hôpitaux de Paris, 75015 Paris, France.
Cancers (Basel). 2020 Dec 19;12(12):3844. doi: 10.3390/cancers12123844.
The worldwide growth of cancer incidence can be explained in part by changes in the prevalence and distribution of risk factors. There are geographical gaps in the estimates of cancer prevalence, which could be filled with innovative methods. We used deep learning (DL) features extracted from satellite images to predict cancer prevalence at the census tract level in seven cities in the United States. We trained the model using detailed cancer prevalence estimates from 2018 available in the CDC (Center for Disease Control) 500 Cities project. Data from 3500 census tracts covering 14,483,366 inhabitants were included. Features were extracted from 170,210 satellite images with deep learning. This method explained up to 64.37% (median = 43.53%) of the variation of cancer prevalence. Satellite features are highly correlated with individual socioeconomic and health measures that are linked to cancer prevalence (age, smoking and drinking status, and obesity). A higher similarity between two environments is associated with better generalization of the model ( = 1.10-6). This method can be used to accurately estimate cancer prevalence at a high spatial resolution without using surveys at a fraction of the cost.
全球癌症发病率的增长部分可归因于风险因素的流行率和分布变化。癌症流行率的估计存在地理差异,可通过创新方法加以填补。我们利用从卫星图像中提取的深度学习(DL)特征来预测美国七个城市普查区层面的癌症流行率。我们使用疾病控制中心(CDC)“500个城市”项目中2018年可用的详细癌症流行率估计值对模型进行训练。纳入了覆盖14483366名居民的3500个普查区的数据。通过深度学习从170210张卫星图像中提取特征。该方法解释了高达64.37%(中位数 = 43.53%)的癌症流行率变化。卫星特征与与癌症流行率相关的个体社会经济和健康指标(年龄、吸烟和饮酒状况以及肥胖)高度相关。两种环境之间的更高相似性与模型的更好泛化相关( = 1.10 - 6)。该方法可用于在不进行调查的情况下,以高空间分辨率准确估计癌症流行率,且成本仅为一小部分。