Department of Biomedical Sciences, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, China.
Massachusetts General Hospital, Boston, MA, USA.
J Neurol Sci. 2022 Sep 15;440:120335. doi: 10.1016/j.jns.2022.120335. Epub 2022 Jul 9.
We conducted a comprehensive evaluation of features associated with stroke records.
We screened the dietary nutrients, blood biomarkers, and clinical information from the National Health and Nutrition Examination Survey (NHANES) 2015-16 database to assess a self-reported history of all strokes (136 strokes, n = 4381). We computed feature importance, built machine learning (ML) models, developed a nomogram, and validated the nomogram on NHANES 2007-08, 2017-18, and the baseline UK Biobank. We calculated the odds ratios with/without adjusting sampling weights (OR/OR).
The clinical features have the best predictive power compared to dietary nutrients and blood biomarkers, with 22.8% increased average area under the receiver operating characteristic curves (AUROC) in ML models. We further modeled with ten most important clinical features without compromising the predictive performance. The key features positively associated with stroke include age, cigarette smoking, tobacco smoking, Caucasian or African American race, hypertension, diabetes mellitus, asthma history; the negatively associated feature is the family income. The nomogram based on these key features achieved good performances (AUROC between 0.753 and 0.822) on the test set, the NHANES 2007-08, 2017-18, and the UK Biobank. Key features from the nomogram model include age (OR = 1.05, OR = 1.06), Caucasian/African American (OR = 2.68, OR = 2.67), diabetes mellitus (OR = 2.30, OR = 1.99), asthma (OR = 2.10, OR = 2.41), hypertension (OR = 1.86, OR = 2.10), and income (OR = 0.83, OR = 0.81).
We identified clinical key features and built predictive models for assessing stroke records with high performance. A nomogram consisting of questionnaire-based variables would help identify stroke survivors and evaluate the potential risk of stroke.
我们对与中风记录相关的特征进行了全面评估。
我们从 2015-16 年国家健康和营养调查(NHANES)数据库中筛选了膳食营养素、血液生物标志物和临床信息,以评估自我报告的所有中风史(136 例中风,n=4381)。我们计算了特征重要性,构建了机器学习(ML)模型,开发了列线图,并在 NHANES 2007-08、2017-18 和英国生物库基线进行了验证。我们计算了有无调整抽样权重的比值比(OR/OR)。
与膳食营养素和血液生物标志物相比,临床特征具有最佳的预测能力,ML 模型的平均接受者操作特征曲线下面积(AUROC)增加了 22.8%。我们进一步使用十个最重要的临床特征进行建模,而不会影响预测性能。与中风呈正相关的关键特征包括年龄、吸烟、白种人或非裔美国人种族、高血压、糖尿病、哮喘史;与中风呈负相关的特征是家庭收入。基于这些关键特征的列线图在测试集、NHANES 2007-08、2017-18 和英国生物库中取得了良好的性能(AUROC 在 0.753 到 0.822 之间)。来自列线图模型的关键特征包括年龄(OR=1.05,OR=1.06)、白种人/非裔美国人(OR=2.68,OR=2.67)、糖尿病(OR=2.30,OR=1.99)、哮喘(OR=2.10,OR=2.41)、高血压(OR=1.86,OR=2.10)和收入(OR=0.83,OR=0.81)。
我们确定了临床关键特征,并构建了具有高性能的评估中风记录的预测模型。由基于问卷的变量组成的列线图将有助于识别中风幸存者并评估中风的潜在风险。