Lim Jiekee, Li Jieyun, Feng Xiao, Feng Lu, Xiao Xinang, Zhou Mi, Yang Hong, Xu Zhaoxia
School of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, PR China.
The First Affiliated Hospital, Guangzhou University of Traditional Chinese Medicine, Guangzhou, PR China.
Heliyon. 2024 Jul 26;10(15):e35283. doi: 10.1016/j.heliyon.2024.e35283. eCollection 2024 Aug 15.
Traditional Chinese Medicine (TCM) offers individualized treatment for Polycystic Ovary Syndrome (PCOS) through pattern differentiation, but the subjectivity of TCM diagnoses can lead to inconsistent outcomes. Integrating machine learning (ML) offers an objective basis to support TCM diagnoses. This study aims to evaluate various feature selection techniques and multi-label ML algorithms to develop an effective predictive model for classifying TCM patterns in PCOS patients, thereby enhancing diagnostic standardization and treatment personalization.
The study utilized a dataset comprising 432 patients with PCOS, exhibiting one or more of five TCM patterns. Feature selection began with Variance Thresholding (VT), followed by a comparison of five advanced techniques: Statistical Analysis Test, Recursive Feature Elimination with Cross-Validation (RFECV), Least Absolute Shrinkage and Selection Operator Regression, BorutaShap, and ReliefF. To ascertain the most effective model for predicting PCOS TCM patterns, four ML algorithms-Support Vector Machine, Logistic Regression, Extreme Gradient Boosting (XGBoost), and Artificial Neural Networks-were evaluated against the identified feature set.
VT reduced the feature count from 224 to 174. RFECV emerged as the most effective feature selection method, identifying 67 key features. XGBoost emerged as the top-performing model, demonstrating superior testing accuracy (0.7870), F1 score (0.9519), and Hamming loss (0.0481) with RFECV-optimized features.
The RFECV-XGBoost model proved effective for classifying TCM patterns in PCOS. It emphasizes the necessity of precise feature selection and the significant capabilities of ML in advancing TCM pattern diagnostics, marking a significant step toward enhancing precise and personalized healthcare in biomedical studies.
中医通过辨证论治为多囊卵巢综合征(PCOS)提供个体化治疗,但中医诊断的主观性可能导致结果不一致。整合机器学习(ML)为支持中医诊断提供了客观依据。本研究旨在评估各种特征选择技术和多标签ML算法,以开发一种有效的预测模型,用于对PCOS患者的中医证型进行分类,从而提高诊断标准化和治疗个性化。
该研究使用了一个包含432例PCOS患者的数据集,这些患者表现出五种中医证型中的一种或多种。特征选择首先采用方差阈值法(VT),然后比较五种先进技术:统计分析测试、带交叉验证的递归特征消除(RFECV)、最小绝对收缩和选择算子回归、BorutaShap和ReliefF。为了确定预测PCOS中医证型的最有效模型,针对识别出的特征集评估了四种ML算法——支持向量机、逻辑回归、极端梯度提升(XGBoost)和人工神经网络。
VT将特征数量从224个减少到174个。RFECV成为最有效的特征选择方法,识别出67个关键特征。XGBoost成为表现最佳的模型,在使用RFECV优化特征时,展示出卓越的测试准确率(0.7870)、F1分数(0.9519)和汉明损失(0.0481)。
RFECV-XGBoost模型被证明对PCOS的中医证型分类有效。它强调了精确特征选择的必要性以及ML在推进中医证型诊断方面的显著能力,标志着在生物医学研究中朝着提高精准和个性化医疗迈出了重要一步。