Suppr超能文献

挖掘谷歌趋势数据以进行结直肠癌(CRC)患病率的即时预报和预测。

Mining Google Trends data for nowcasting and forecasting colorectal cancer (CRC) prevalence.

作者信息

Tudor Cristiana, Sova Robert Aurelian

机构信息

Bucharest University of Economic Studies, Bucharest, Romania.

出版信息

PeerJ Comput Sci. 2023 Oct 4;9:e1518. doi: 10.7717/peerj-cs.1518. eCollection 2023.

Abstract

BACKGROUND

Colorectal cancer (CRC) is the third most prevalent and second most lethal form of cancer in the world. Consequently, CRC cancer prevalence projections are essential for assessing the future burden of the disease, planning resource allocation, and developing service delivery strategies, as well as for grasping the shifting environment of cancer risk factors. However, unlike cancer incidence and mortality rates, national and international agencies do not routinely issue projections for cancer prevalence. Moreover, the limited or even nonexistent cancer statistics for large portions of the world, along with the high heterogeneity among world nations, further complicate the task of producing timely and accurate CRC prevalence projections. In this situation, population interest, as shown by Internet searches, can be very important for improving cancer statistics and, in the long run, for helping cancer research.

METHODS

This study aims to model, nowcast and forecast the CRC prevalence at the global level using a three-step framework that incorporates three well-established univariate statistical and machine-learning models. First, data mining is performed to evaluate the relevancy of Google Trends (GT) data as a surrogate for the number of CRC survivors. The results demonstrate that population web-search interest in the term "colonoscopy" is the most reliable indicator to nowcast CRC disease prevalence. Then, various statistical and machine-learning models, including ARIMA, ETS, and FNNAR, are trained and tested using relevant GT time series. Finally, the updated monthly query series spanning 2004-2022 and the best forecasting model in terms of out-of-sample forecasting ability (., the neural network autoregression) are utilized to generate point forecasts up to 2025.

RESULTS

Results show that the number of people with colorectal cancer will continue to rise over the next 24 months. This in turn emphasizes the urgency for public policies aimed at reducing the population's exposure to the principal modifiable risk factors, such as lifestyle and nutrition. In addition, given the major drop in population interest in CRC during the first wave of the COVID-19 pandemic, the findings suggest that public health authorities should implement measures to increase cancer screening rates during pandemics. This in turn would deliver positive externalities, including the mitigation of the global burden and the enhancement of the quality of official statistics.

摘要

背景

结直肠癌(CRC)是全球第三大常见癌症,也是第二大致命癌症。因此,结直肠癌患病率预测对于评估该疾病未来的负担、规划资源分配、制定服务提供策略以及了解癌症风险因素不断变化的环境至关重要。然而,与癌症发病率和死亡率不同,国家和国际机构并不定期发布癌症患病率预测。此外,世界上大部分地区的癌症统计数据有限甚至不存在,再加上世界各国之间存在高度异质性,使得及时准确地进行结直肠癌患病率预测的任务更加复杂。在这种情况下,互联网搜索显示出的公众关注度对于改善癌症统计数据,从长远来看,对于帮助癌症研究可能非常重要。

方法

本研究旨在使用一个三步框架对全球范围内的结直肠癌患病率进行建模、即时预测和预测,该框架纳入了三种成熟的单变量统计和机器学习模型。首先,进行数据挖掘以评估谷歌趋势(GT)数据作为结直肠癌幸存者数量替代指标的相关性。结果表明,人群对“结肠镜检查”一词的网络搜索兴趣是即时预测结直肠癌疾病患病率最可靠的指标。然后,使用相关的GT时间序列对包括自回归积分移动平均(ARIMA)、误差项、趋势和季节性(ETS)以及前馈神经网络自回归(FNNAR)在内的各种统计和机器学习模型进行训练和测试。最后,利用涵盖2004 - 2022年的更新月度查询序列以及在样本外预测能力方面表现最佳的预测模型(即神经网络自回归)生成直至2025年的点预测。

结果

结果显示,在接下来的24个月里,结直肠癌患者数量将持续上升。这反过来强调了旨在减少人群接触主要可改变风险因素(如生活方式和营养)的公共政策的紧迫性。此外,鉴于在新冠疫情第一波期间人群对结直肠癌的关注度大幅下降,研究结果表明公共卫生当局应采取措施在疫情期间提高癌症筛查率。这反过来将产生积极的外部效应,包括减轻全球负担和提高官方统计数据的质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fafc/10588692/2c3aa4a42da7/peerj-cs-09-1518-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验