Gong Lejun, Jiang Jindou, Chen Shiqi, Qi Mingming
Jiangsu Key Lab of Big Data Security and Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China.
Smart Health Big Data Analysis and Location Services Engineering Lab of Jiangsu Province, Nanjing, China.
Front Genet. 2023 Oct 3;14:1272016. doi: 10.3389/fgene.2023.1272016. eCollection 2023.
Syndrome differentiation and treatment is the basic principle of traditional Chinese medicine (TCM) to recognize and treat diseases. Accurate syndrome differentiation can provide a reliable basis for treatment, therefore, establishing a scientific intelligent syndrome differentiation method is of great significance to the modernization of TCM. With the development of biomdical text mining technology, TCM has entered the era of intelligence that based on data, and model training increasingly relies on the large-scale labeled data. However, it is difficult to form a large standard data set in the field of TCM due to the low degree of standardization of TCM data collection and the privacy protection of patients' medical records. To solve the above problem, a multi-label deep forest model based on an improved multi-label ReliefF feature selection algorithm, ML-PRDF, is proposed to enhance the representativeness of features within the model, express the original information with fewer features, and achieve optimal classification accuracy, while alleviating the problem of high data processing cost of deep forest models and achieving effective TCM discriminative analysis under small samples. The results show that the proposed model finally outperforms other multi-label classification models in terms of multi-label evaluation criteria, and has higher accuracy in the TCM syndrome differentiation problem compared with the traditional multi-label deep forest, and the comparative study shows that the use of PCC-MLRF algorithm for feature selection can better select representative features.
辨证论治是中医认识和治疗疾病的基本原则。准确的辨证能为治疗提供可靠依据,因此,建立科学的智能辨证方法对中医现代化具有重要意义。随着生物医学文本挖掘技术的发展,中医进入了基于数据的智能时代,模型训练越来越依赖大规模标注数据。然而,由于中医数据采集的标准化程度低以及患者病历的隐私保护问题,在中医领域难以形成大规模标准数据集。为解决上述问题,提出了一种基于改进的多标签ReliefF特征选择算法的多标签深度森林模型ML-PRDF,以增强模型内特征的代表性,用更少的特征表达原始信息,实现最优分类精度,同时缓解深度森林模型数据处理成本高的问题,并在小样本下实现有效的中医判别分析。结果表明,所提模型最终在多标签评估标准方面优于其他多标签分类模型,与传统多标签深度森林相比,在中医辨证问题上具有更高的准确率,对比研究表明,使用PCC-MLRF算法进行特征选择能更好地选择具有代表性的特征。