Xu Yiliu, He Zhaoming, Wang Hao
Research Center of Fluid Machinery Engineering & Technology, Jiangsu University, Zhenjiang 212013, China.
Department of Mechanical Engineering, Texas Tech University, Lubbock, TX 79411, USA.
Sensors (Basel). 2025 May 22;25(11):3254. doi: 10.3390/s25113254.
Cuffless continuous blood pressure (BP) monitoring is essential for personal health management. However, its accuracy is challenged by the diversity and heterogeneity of physiological data sources. We propose a multi-source feature selection framework based on Markov blanket theory and the concept of causal invariance. We extracted 218 BP-related photoplethysmography (PPG) features from three heterogeneous datasets (differing in subject population, acquisition devices, and methods) and constructed a causal feature set using the Multi-Dataset Stable Feature Selection via Ensemble Markov Blanket (MDSFS-EMB) algorithm. BP estimation was then performed using four machine learning models. The MDSFS-EMB algorithm integrated PPFS and HITON-MB, enabling adaptability to different data scales and distribution scenarios. It employed Gaussian Copula Mutual Information, which was robust to outliers and capable of modeling nonlinear relationships. To validate the effectiveness of the selected feature set, we conducted experiments using an independent external validation dataset and explored the impact of data segmentation strategies on model prediction outcomes. The results demonstrated that the MDSFS-EMB algorithm has advantages in feature selection efficiency, prediction accuracy, and generalization capability. This study innovatively explores the causal relationships between PPG features and BP across multiple data sources, providing a clinically applicable approach for cuffless BP estimation.
无袖连续血压(BP)监测对于个人健康管理至关重要。然而,生理数据源的多样性和异质性对其准确性提出了挑战。我们提出了一种基于马尔可夫毯理论和因果不变性概念的多源特征选择框架。我们从三个异构数据集(在受试者群体、采集设备和方法上存在差异)中提取了218个与血压相关的光电容积脉搏波描记法(PPG)特征,并使用通过集成马尔可夫毯的多数据集稳定特征选择(MDSFS-EMB)算法构建了一个因果特征集。然后使用四种机器学习模型进行血压估计。MDSFS-EMB算法集成了PPFS和HITON-MB,能够适应不同的数据规模和分布场景。它采用了高斯Copula互信息,对异常值具有鲁棒性,并且能够对非线性关系进行建模。为了验证所选特征集的有效性,我们使用独立的外部验证数据集进行了实验,并探讨了数据分割策略对模型预测结果的影响。结果表明,MDSFS-EMB算法在特征选择效率、预测准确性和泛化能力方面具有优势。本研究创新性地探索了多个数据源中PPG特征与血压之间的因果关系,为无袖血压估计提供了一种临床适用的方法。