Zou Yingchang, Wang Yu, Jiang Zaile, Zhou Yuan, Chen Ying, Hu Yanjie, Jiang Guobao, Xie Duan
School of Electronic Information and Electrical Engineering, Changsha University, Changsha 410003, China.
Research Center for Healthcare Data Science, Zhijiang Lab, Hangzhou, China.
Lung Cancer. 2021 Apr;154:206-213. doi: 10.1016/j.lungcan.2021.01.020. Epub 2021 Jan 30.
Lung cancer is continuously the leading cause of cancer related death, resulting from the lack of specific symptoms at early stage. A large-scale screening method may be the key point to find asymptomatic patients, leading to the reduction of mortality.
An alternative method combining breath test and a machine learning algorithm is proposed. 236 breath samples were analyzed by TD-GCMS. Breath profile of each sample is composed of 308 features extracted from chromatogram. Gradient boost decision trees algorithm was employed to recognize lung cancer patients. Bootstrap is performed to simulate real diagnostic practice, with which we evaluated the confidence of our methods.
An accuracy of 85 % is shown in 6-fold cross validations. In statistical bootstrap, 72 % samples are marked as "confident", and the accuracy of confident samples is 93 % throughout the cross validations.
We have proposed such a non-invasive, accurate and confident method that might contribute to large-scale screening of lung cancer. As a consequence, more asymptomatic patients with early lung cancer may be detected.
肺癌一直是癌症相关死亡的主要原因,这是由于其在早期缺乏特异性症状所致。大规模筛查方法可能是发现无症状患者的关键,从而降低死亡率。
提出了一种结合呼气测试和机器学习算法的替代方法。通过热脱附-气相色谱-质谱联用仪(TD-GCMS)分析了236份呼气样本。每个样本的呼气图谱由从色谱图中提取的308个特征组成。采用梯度提升决策树算法来识别肺癌患者。进行自助法以模拟实际诊断实践,借此评估我们方法的可信度。
在6折交叉验证中显示准确率为85%。在统计自助法中,72%的样本被标记为“可信”,并且在整个交叉验证过程中,可信样本的准确率为93%。
我们提出了这样一种无创、准确且可信的方法,它可能有助于肺癌的大规模筛查。因此,可能会检测到更多早期无症状肺癌患者。