Yoon Hoon-Seok, Kim Yoon-Chul
Medical Artificial Intelligence Laboratory, Division of Digital Healthcare, College of Software and Digital Healthcare Convergence, Yonsei University, Wonju, Korea.
Healthc Inform Res. 2025 Jul;31(3):284-294. doi: 10.4258/hir.2025.31.3.284. Epub 2025 Jul 31.
The objective of this study was to evaluate the effectiveness of machine learning (ML) models using selected subsets of features to predict age based on intracranial arterial segments' tortuosity and diameter characteristics derived from magnetic resonance angiography (MRA) data. Additionally, this study aimed to identify key vascular features important for predicting vascular age.
Three-dimensional time-of-flight MRA image data from 171 subjects were analyzed. After annotating the endpoints for each arterial segment, 169 features-comprising tortuosity metrics and arterial segment diameter statistics-were extracted. Five ML models (random forest, linear regression, AdaBoost, XGBoost, and lightGBM) were trained and validated. Two feature selection methods, correlation-based feature selection (CFS) and Relief-F, were applied to identify optimal feature subsets.
The random forest model utilizing the CFS-based 50% feature subset achieved the best performance, with a root mean square error of 14.0 years, a coefficient of determination (R2) of 0.275, and a Pearson correlation coefficient of 0.560. Tortuosity metrics (e.g., triangular index of the left posterior cerebral artery P1 segment) appeared more frequently than diameter statistics among the top five most important features.
CFS-based feature selection enhanced the performance of ML-based age prediction compared with using the complete feature set. Linear regression consistently demonstrated the poorest performance across all evaluation metrics. ML-based age prediction using segmental tortuosity metrics and diameter statistics is feasible, potentially revealing significant features related to vascular aging.
本研究的目的是评估使用选定特征子集的机器学习(ML)模型,根据磁共振血管造影(MRA)数据得出的颅内动脉段的迂曲度和直径特征来预测年龄的有效性。此外,本研究旨在确定对预测血管年龄重要的关键血管特征。
分析了171名受试者的三维时间飞跃MRA图像数据。在标注每个动脉段的端点后,提取了169个特征,包括迂曲度指标和动脉段直径统计数据。训练并验证了五个ML模型(随机森林、线性回归、AdaBoost、XGBoost和lightGBM)。应用两种特征选择方法,基于相关性的特征选择(CFS)和Relief-F,来识别最佳特征子集。
利用基于CFS的50%特征子集的随机森林模型表现最佳,均方根误差为14.0岁,决定系数(R2)为0.275,皮尔逊相关系数为0.560。在最重要的五个特征中,迂曲度指标(例如左大脑后动脉P1段的三角形指数)比直径统计数据出现得更频繁。
与使用完整特征集相比,基于CFS的特征选择提高了基于ML的年龄预测性能。在所有评估指标中,线性回归始终表现最差。使用节段迂曲度指标和直径统计数据进行基于ML的年龄预测是可行的,可能会揭示与血管老化相关的重要特征。