Department of Breast, Head and Neck Surgery, Xinjiang Medical University Affiliated Tumor Hospital, Urumqi, China.
College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.
Photodiagnosis Photodyn Ther. 2022 Mar;37:102647. doi: 10.1016/j.pdpdt.2021.102647. Epub 2021 Nov 21.
Thyroid carcinoma is with the highest diagnosis rate in the endocrine system, and its main histological subtype is papillary thyroid carcinoma (PTC) accounting for 80% of thyroid malignancies. In recent years, the incidence of thyroid cancer has increased exponentially, and its substantial increase was closely related to the overdiagnosis of papillary microcarcinoma (PMC). Therefore, early and accurate identification of PTC and PMC can prevent patients from over treatment. This study aimed to identify PTC and PMC using Raman spectroscopy. We collected serum Raman spectra from 16 patients with PTC and 31 patients with PMC. Firstly, the collected imbalance data were preprocessed using the synthetic minority over-sampling technique (SMOTE). Then, the equalized data were dimensionality reduced by principal component analysis (PCA). Finally, the processed data were fed into the single decision tree (DT) classifier, as well as the random forest (RF) built on the idea of Boosting ensemble and the Adaptive Boosting (Adaboost) model built on the idea of Bagging ensemble for classification. The classification accuracy of the three models in the testing set were 75.38%, 81.54%, and 84.61%, respectively. Compared with the DT classifier, the accuracy of the models introducing the idea of ensemble learning was enhanced by 6.16% and 9.23%, respectively. The best model was the Adaboost. This result demonstrates that serum Raman spectroscopy combined with an ensemble learning algorithm was feasible in rapidly identifying PTC and PMC. At the same time, the method has great potential for application in the field of clinical diagnosis.
甲状腺癌是内分泌系统中诊断率最高的癌症,其主要组织学亚型是甲状腺乳头状癌(PTC),占甲状腺恶性肿瘤的 80%。近年来,甲状腺癌的发病率呈指数级增长,其显著增加与甲状腺微小乳头状癌(PMC)的过度诊断密切相关。因此,早期准确识别 PTC 和 PMC 可以防止患者过度治疗。本研究旨在使用拉曼光谱识别 PTC 和 PMC。我们从 16 名 PTC 患者和 31 名 PMC 患者中收集了血清拉曼光谱。首先,使用合成少数过采样技术(SMOTE)对采集到的不平衡数据进行预处理。然后,通过主成分分析(PCA)对均衡数据进行降维。最后,将处理后的数据输入到单个决策树(DT)分类器中,以及基于 Boosting 集成思想构建的随机森林(RF)和基于 Bagging 集成思想构建的自适应提升(Adaboost)模型进行分类。在测试集中,三个模型的分类准确率分别为 75.38%、81.54%和 84.61%。与 DT 分类器相比,引入集成学习思想的模型的准确率分别提高了 6.16%和 9.23%。最佳模型是 Adaboost。该结果表明,血清拉曼光谱结合集成学习算法在快速识别 PTC 和 PMC 方面是可行的。同时,该方法在临床诊断领域具有很大的应用潜力。