Abedi Vida, Lambert Clare, Chaudhary Durgesh, Rieder Emily, Avula Venkatesh, Hwang Wenke, Li Jiang, Zand Ramin
Department of Molecular and Functional Genomics, Weis Center for Research, Geisinger Health System, Danville, PA 17822, USA.
Department of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA 17033, USA.
J Clin Med. 2023 Mar 30;12(7):2600. doi: 10.3390/jcm12072600.
: The cut-point for defining the age of young ischemic stroke (IS) is clinically and epidemiologically important, yet it is arbitrary and differs across studies. In this study, we leveraged electronic health records (EHRs) and data science techniques to estimate an optimal cut-point for defining the age of young IS. Patient-level EHRs were extracted from 13 hospitals in Pennsylvania, and used in two parallel approaches. The first approach included ICD9/10, from IS patients to group comorbidities, and computed similarity scores between every patient pair. We determined the optimal age of young IS by analyzing the trend of patient similarity with respect to their clinical profile for different ages of index IS. The second approach used the IS cohort and control (without IS), and built three sets of machine-learning models-generalized linear regression (GLM), random forest (RF), and XGBoost (XGB)-to classify patients for seventeen age groups. After extracting feature importance from the models, we determined the optimal age of young IS by analyzing the pattern of comorbidity with respect to the age of index IS. Both approaches were completed separately for male and female patients. The stroke cohort contained 7555 ISs, and the control included 31,067 patients. In the first approach, the optimal age of young stroke was 53.7 and 51.0 years in female and male patients, respectively. In the second approach, we created 102 models, based on three algorithms, 17 age brackets, and two sexes. The optimal age was 53 (GLM), 52 (RF), and 54 (XGB) for female, and 52 (GLM and RF) and 53 (RF) for male patients. Different age and sex groups exhibited different comorbidity patterns. Using a data-driven approach, we determined the age of young stroke to be 54 years for women and 52 years for men in our mainly rural population, in central Pennsylvania. Future validation studies should include more diverse populations.
确定青年缺血性卒中(IS)年龄的切点在临床和流行病学上具有重要意义,但它是人为设定的,且不同研究有所差异。在本研究中,我们利用电子健康记录(EHR)和数据科学技术来估计定义青年IS年龄的最佳切点。从宾夕法尼亚州的13家医院提取了患者层面的EHR,并用于两种并行方法。第一种方法包括使用国际疾病分类第九版/第十版(ICD9/10),对IS患者的合并症进行分组,并计算每对患者之间的相似性得分。我们通过分析不同年龄的索引IS患者临床特征的相似性趋势,确定青年IS的最佳年龄。第二种方法使用IS队列和对照组(无IS),构建了三组机器学习模型——广义线性回归(GLM)、随机森林(RF)和极端梯度提升(XGB),对17个年龄组的患者进行分类。从模型中提取特征重要性后,我们通过分析索引IS年龄的合并症模式来确定青年IS的最佳年龄。两种方法均分别针对男性和女性患者完成。卒中队列包含7555例IS患者,对照组包括31067例患者。在第一种方法中,女性和男性患者青年卒中的最佳年龄分别为53.7岁和51.0岁。在第二种方法中,基于三种算法、17个年龄组和两种性别,我们创建了102个模型。女性患者的最佳年龄分别为53岁(GLM)、52岁(RF)和54岁(XGB),男性患者为52岁(GLM和RF)和53岁(RF)。不同年龄和性别组表现出不同的合并症模式。通过数据驱动的方法,我们确定在宾夕法尼亚州中部以农村人口为主的人群中,青年卒中的年龄女性为54岁,男性为52岁。未来的验证研究应纳入更多样化的人群。