Kim Seongyoon, Lee Sanghee, Choi Jung-Il, Cho Hyunsoon
School of Mathematics and Computing (Computational Science and Engineering), Yonsei University, Seoul, Korea.
Department of Cancer Control and Population Health, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Korea.
Stat Med. 2021 Feb 10;40(3):799-822. doi: 10.1002/sim.8803. Epub 2020 Nov 17.
The joinpoint regression model (JRM) is used to describe trend changes in many applications and relies on the detection of joinpoints (changepoints). However, the existing joinpoint detection methods, namely, the grid search (GS)-based methods, are computationally demanding, and hence, the maximum number of computable joinpoints is limited. Herein, we developed a genetic algorithm-based joinpoint (GAJP) model in which an explicitly decoupled computing procedure for optimization and regression is used to embed a binary genetic algorithm into the JRM for optimal joinpoint detection. The combinations of joinpoints were represented as binary chromosomes, and genetic operations were performed to determine the optimum solution by minimizing the fitness function, the Bayesian information criterion (BIC) and BIC . The accuracy and computational performance of the GAJP model were evaluated via intensive simulation studies and compared with those of the GS-based methods using BIC, BIC , and permutation test. The proposed method showed an outstanding computational efficiency in detecting multiple joinpoints. Finally, the suitability of the GAJP model for the analysis of cancer incidence trends was demonstrated by applying this model to data on the incidence of colorectal cancer in the United States from 1975 to 2016 from the National Cancer Institute's Surveillance, Epidemiology, and End Results program. Thus, the GAJP model was concluded to be practically feasible to detect multiple joinpoints up to the number of grids without requirement to preassign the number of joinpoints and be easily extendable to cancer trend analysis utilizing large datasets.
连接点回归模型(JRM)在许多应用中用于描述趋势变化,且依赖于连接点(变化点)的检测。然而,现有的连接点检测方法,即基于网格搜索(GS)的方法,计算量很大,因此,可计算的连接点的最大数量受到限制。在此,我们开发了一种基于遗传算法的连接点(GAJP)模型,其中使用了一种明确解耦的优化和回归计算程序,将二元遗传算法嵌入到JRM中以进行最优连接点检测。连接点的组合被表示为二元染色体,并通过执行遗传操作,通过最小化适应度函数、贝叶斯信息准则(BIC)和BIC来确定最优解。通过大量的模拟研究评估了GAJP模型的准确性和计算性能,并使用BIC、BIC和置换检验将其与基于GS的方法进行了比较。所提出的方法在检测多个连接点方面表现出出色的计算效率。最后,通过将该模型应用于美国国立癌症研究所监测、流行病学和最终结果计划中1975年至2016年结直肠癌发病率数据,证明了GAJP模型在分析癌症发病率趋势方面的适用性。因此,可以得出结论,GAJP模型在不预先指定连接点数量的情况下,检测多达网格数量的多个连接点在实际中是可行的,并且易于扩展到利用大型数据集的癌症趋势分析。