Park Sa-Yoon, Park Musun, Lee Won-Yung, Lee Choong-Yeol, Kim Ji-Hwan, Lee Siwoo, Kim Chang-Eop
Department of Physiology, College of Korean Medicine, Gachon University, Seongnam, Republic of Korea.
Department of Sasang Constitutional Medicine, Gil Hospital of Korean Medicine, Gachon University, Incheon, Republic of Korea.
Integr Med Res. 2021 Sep;10(3):100668. doi: 10.1016/j.imr.2020.100668. Epub 2020 Sep 30.
Despite the importance of accurate Sasang type diagnosis, a unique form of Korean medicine, there have been concerns about consistency among diagnoses. We investigate a data-driven integrative diagnostic model by applying machine learning to a multicenter clinical dataset with comprehensive features.
Extremely randomized trees (ERT), support vector machines, multinomial logistic regression, and K-nearest neighbor were applied, and performances were evaluated by cross-validation. The feature importance of the classifier was analyzed to understand which information is crucial in diagnosis.
The ERT classifier showed the highest performance, with an overall f1 score of 0.60 ± 0.060. The feature classes of body measurement, personality, general information, and cold-heat were more decisive than others in classifying Sasang types. Costal angle was the most informative feature. In pairwise classification, we found Sasang type-dependent distinctions that body measurement features played a key role in TE-SE and TE-SY datasets, while personality and cold-heat features showed importance in SE-SY dataset.
Current study investigated a comprehensive diagnostic model for Sasang type using machine learning and achieved better performance than previous studies. This study helps data-driven decision making in clinics by revealing key features contributing to the Sasang type diagnosis.
尽管准确的四象体质诊断(一种独特的韩医学形式)很重要,但人们一直担心诊断之间的一致性。我们通过将机器学习应用于具有综合特征的多中心临床数据集,研究了一种数据驱动的综合诊断模型。
应用极端随机树(ERT)、支持向量机、多项逻辑回归和K近邻算法,并通过交叉验证评估性能。分析分类器的特征重要性,以了解哪些信息在诊断中至关重要。
ERT分类器表现出最高的性能,总体F1分数为0.60±0.060。在四象体质类型分类中,身体测量、性格、一般信息和寒热特征类别比其他类别更具决定性。肋角是最具信息量的特征。在两两分类中,我们发现了四象体质类型相关的差异,即身体测量特征在太阴-少阴和太阴-少阳数据集中起关键作用,而性格和寒热特征在少阴-少阳数据集中显示出重要性。
本研究使用机器学习研究了一种全面的四象体质类型诊断模型,取得了比以往研究更好的性能。本研究通过揭示有助于四象体质类型诊断的关键特征,有助于临床中的数据驱动决策。