Zhu Jing, Yang Chenxi, Song Siyu, Wang Ruting, Gu Liqiang, Chen Zhongjian
Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Hangzhou, Zhejiang, 310022, China.
Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Hangzhou, Zhejiang, 310022, China; Zhejiang Key Laboratory of Diagnosis & Treatment Technology on Thoracic Oncology (Lung and Esophagus), Hangzhou, Zhejiang, 310022, China.
Anal Biochem. 2023 May 15;669:115120. doi: 10.1016/j.ab.2023.115120. Epub 2023 Mar 24.
Near-infrared spectroscopy (NIRS) is a non-invasive and convenient tool, which gains features related to chemical components in biological samples. Machine learning (ML) has been popularized in medical diagnosis. This study aimed at investigating a novel cancer diagnosis strategy using NIRS data based ML modeling.
Plasma samples were collected from a total of 247 participants, including lung cancer, cervical cancer, nasopharyngeal cancer, and healthy control, and were randomly split into train set and test set. After performing NIRS analysis, the train dataset was utilized to train ML models, including partial least-squares (PLS), random forest (RF), gradient boosting machine (GBM), and support-vector machine (SVM). Subsequently, these models were tested for their prediction performance by the test set.
All ML models demonstrated high prediction performance in differentiating cancers from controls, and SVM had high prediction accuracy for different types of cancers. SVM was considered as the most suitable model for its minimal computational cost and high accuracies for both binary and quaternary classification.
This strategy coupling NIRS with ML is insightful that may aid in clinic cancer diagnosis, while further studies should test our results in a larger cohort with better representativeness.
近红外光谱(NIRS)是一种非侵入性且便捷的工具,可获取与生物样本中化学成分相关的特征。机器学习(ML)已在医学诊断中得到广泛应用。本研究旨在探讨一种基于NIRS数据的机器学习建模的新型癌症诊断策略。
共收集了247名参与者的血浆样本,包括肺癌、宫颈癌、鼻咽癌患者及健康对照者,并将其随机分为训练集和测试集。在进行近红外光谱分析后,利用训练数据集训练机器学习模型,包括偏最小二乘法(PLS)、随机森林(RF)、梯度提升机(GBM)和支持向量机(SVM)。随后,用测试集检验这些模型的预测性能。
所有机器学习模型在区分癌症与对照方面均表现出较高的预测性能,支持向量机对不同类型癌症具有较高的预测准确率。支持向量机因其计算成本最低且在二元和四元分类中准确率较高,被认为是最合适的模型。
这种将近红外光谱与机器学习相结合的策略具有启发性,可能有助于临床癌症诊断,而进一步的研究应在更具代表性的更大队列中验证我们的结果。