The Key Laboratory of Machine Perception (Ministry of Education), School of EECS, Peking University, Beijing, China.
Civil Aviation Medicine Center, Civil Aviation Administration of China, Beijing, China.
BMJ Open. 2017 Sep 24;7(9):e015443. doi: 10.1136/bmjopen-2016-015443.
Esophageal squamous cell carcinoma (ESCC) is the predominant form of esophageal carcinoma with extremely aggressive nature and low survival rate. The risk factors for ESCC in the high-incidence areas of China remain unclear. We used machine learning methods to investigate whether there was an association between the alterations of serum levels of certain chemical elements and ESCC.
Primary healthcare unit in city, Henan Province of China.
100 patients with ESCC and 100 healthy controls matched for age, sex and region were included.
Primary outcome was the classification accuracy. Secondary outcome was the p Value of the t-test or rank-sum test.
Both traditional statistical methods of t-test and rank-sum test and fashionable machine learning approaches were employed.
Random Forest achieves the best accuracy of 98.38% on the original feature vectors (without dimensionality reduction), and support vector machine outperforms other classifiers by yielding accuracy of 96.56% on embedding spaces (with dimensionality reduction). All six classifiers can achieve accuracies more than 90% based on the single most important element Sr. The other two elements with distinctive difference are S and P, providing accuracies around 80%. More than half of chemical elements were found to be significantly different between patients with ESCC and the controls.
These results suggest clear differences between patients with ESCC and controls, implying some potential promising applications in diagnosis, prognosis, pharmacy and nutrition of ESCC. However, the results should be interpreted with caution due to the retrospective design nature, limited sample size and the lack of several potential confounding factors (including obesity, nutritional status, and fruit and vegetable consumption and potential regional carcinogen contacts).
食管鳞状细胞癌(ESCC)是食管癌的主要形式,具有极强的侵袭性和低生存率。中国高发地区 ESCC 的危险因素尚不清楚。我们使用机器学习方法研究血清中某些化学元素水平的改变与 ESCC 是否存在关联。
中国河南省某市基层医疗单位。
纳入 100 例 ESCC 患者和 100 例年龄、性别和地区匹配的健康对照者。
主要结局指标为分类准确性。次要结局指标为 t 检验或秩和检验的 P 值。
采用传统的 t 检验和秩和检验统计学方法以及时髦的机器学习方法。
随机森林在原始特征向量(无降维)上达到最佳准确率 98.38%,支持向量机在嵌入空间(降维)上的准确率为 96.56%,优于其他分类器。所有 6 种分类器都可以基于单个最重要的元素 Sr 达到 90%以上的准确率。另外两个具有显著差异的元素是 S 和 P,提供了 80%左右的准确率。发现 ESCC 患者和对照组之间有超过一半的化学元素存在明显差异。
这些结果表明 ESCC 患者与对照组之间存在明显差异,这意味着在 ESCC 的诊断、预后、药学和营养方面可能有一些有前途的应用。然而,由于回顾性设计、样本量有限以及缺乏一些潜在的混杂因素(包括肥胖、营养状况、水果和蔬菜的摄入以及潜在的区域性致癌接触),结果应谨慎解释。