Chen Dan, Hu Jun, Zhu Mei, Tang Niansheng, Yang Yang, Feng Yuran
Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming, 650091 China.
College of Science, Yunnan Agricultural University, Kunming, 650201 China.
BioData Min. 2020 Sep 3;13:14. doi: 10.1186/s13040-020-00223-w. eCollection 2020.
Various combinations of ultrasonographic (US) characteristics are increasingly utilized to classify thyroid nodules. But they lack theories, and heavily depend on radiologists' experience, and cannot correctly classify thyroid nodules. Hence, our main purpose of this manuscript is to select the US characteristics significantly associated with malignancy and to develop an efficient scoring system for facilitating ultrasonic clinicians to correctly identify thyroid malignancy.
A logistic regression (LR) model is utilized to identify the potential thyroid malignancy, and the least absolute shrinkage and selection operator (LASSO) method is adopted to simultaneously select US characteristics significantly associated with malignancy and estimate parameters in LR model. Based on the selected US characteristics, we calculate the probability for each of thyroid nodules via random forest (RF) and extreme learning machine (ELM), and develop a scoring system to classify thyroid nodules. For comparison, we also consider eight state-of-the-art methods such as support vector machine (SVM), neural network (NET), etc. The area under the receiver operating characteristic curve (AUC) is employed to measure the accuracy of various classifiers.
The US characteristics: nodule size, AP/T≥1, solid component, micro-calcifications, hackly border, hypoechogenicity, presence of halo, unclear border, irregular margin, and central vascularity are selected as the significant predictors associated with thyroid malignancy via the LASSO LR (LLR). Using the developed scoring system, thyroid nodules are classified into the following four categories: benign, low suspicion, intermediate suspicion, and high suspicion, whose rates of malignancy correctly identified for RF (ELM) method on the testing dataset are 0.0% (4.3%), 14.3% (50.0%), 58.1% (59.1%) and 96.1% (97.7%), respectively.
LLR together with RF performs better than other methods in identifying malignancy, especially for abnormal nodules, in terms of risk scores. The developed scoring system can well predict the risk of malignancy and guide medical doctors to make management decisions for reducing the number of unnecessary biopsies for benign nodules.
超声(US)特征的各种组合越来越多地用于甲状腺结节的分类。但它们缺乏理论依据,严重依赖放射科医生的经验,无法正确对甲状腺结节进行分类。因此,本手稿的主要目的是选择与恶性肿瘤显著相关的超声特征,并开发一种有效的评分系统,以帮助超声临床医生正确识别甲状腺恶性肿瘤。
利用逻辑回归(LR)模型识别潜在的甲状腺恶性肿瘤,并采用最小绝对收缩和选择算子(LASSO)方法同时选择与恶性肿瘤显著相关的超声特征,并估计LR模型中的参数。基于选定的超声特征,我们通过随机森林(RF)和极限学习机(ELM)计算每个甲状腺结节的概率,并开发一种评分系统对甲状腺结节进行分类。为了进行比较,我们还考虑了八种先进的方法,如支持向量机(SVM)、神经网络(NET)等。采用受试者操作特征曲线(AUC)下的面积来衡量各种分类器的准确性。
通过LASSO逻辑回归(LLR)选择的超声特征:结节大小、前后径/左右径≥1、实性成分、微钙化、边界粗糙、低回声、晕圈存在、边界不清、边缘不规则和中央血管,作为与甲状腺恶性肿瘤相关的显著预测因子。使用开发的评分系统,甲状腺结节分为以下四类:良性、低可疑、中度可疑和高可疑,在测试数据集上,RF(ELM)方法正确识别恶性肿瘤的比率分别为0.0%(4.3%)、14.3%(50.0%)、58.1%(59.1%)和96.1%(97.7%)。
就风险评分而言,LLR与RF在识别恶性肿瘤方面比其他方法表现更好,尤其是对于异常结节。开发的评分系统可以很好地预测恶性肿瘤风险,并指导医生做出管理决策,以减少良性结节不必要的活检数量。