Cai Zhihua, Xu Dong, Zhang Qing, Zhang Jiexia, Ngai Sai-Ming, Shao Jianlin
Affiliated Cancer Hospital of Guangzhou Medical University, Guangzhou, Guangdong Province, China.
Mol Biosyst. 2015 Mar;11(3):791-800. doi: 10.1039/c4mb00659c. Epub 2014 Dec 16.
Lung cancer is one of the leading causes of death worldwide. There are three major types of lung cancers, non-small cell lung cancer (NSCLC), small cell lung cancer (SCLC) and carcinoid. NSCLC is further classified into lung adenocarcinoma (LADC), squamous cell lung cancer (SQCLC) as well as large cell lung cancer. Many previous studies demonstrated that DNA methylation has emerged as potential lung cancer-specific biomarkers. However, whether there exists a set of DNA methylation markers simultaneously distinguishing such three types of lung cancers remains elusive. In the present study, ROC (Receiving Operating Curve), RFs (Random Forests) and mRMR (Maximum Relevancy and Minimum Redundancy) were proposed to capture the unbiased, informative as well as compact molecular signatures followed by machine learning methods to classify LADC, SQCLC and SCLC. As a result, a panel of 16 DNA methylation markers exhibits an ideal classification power with an accuracy of 86.54%, 84.6% and a recall 84.37%, 85.5% in the leave-one-out cross-validation (LOOCV) and independent data set test experiments, respectively. Besides, comparison results indicate that ensemble-based feature selection methods outperform individual ones when combined with the incremental feature selection (IFS) strategy in terms of the informative and compact property of features. Taken together, results obtained suggest the effectiveness of the ensemble-based feature selection approach and the possible existence of a common panel of DNA methylation markers among such three types of lung cancer tissue, which would facilitate clinical diagnosis and treatment.
肺癌是全球主要死因之一。肺癌主要有三种类型,即非小细胞肺癌(NSCLC)、小细胞肺癌(SCLC)和类癌。非小细胞肺癌又进一步分为肺腺癌(LADC)、肺鳞状细胞癌(SQCLC)以及大细胞肺癌。许多先前的研究表明,DNA甲基化已成为潜在的肺癌特异性生物标志物。然而,是否存在一组DNA甲基化标记物能够同时区分这三种类型的肺癌仍不清楚。在本研究中,我们提出使用ROC(接收操作曲线)、随机森林(RFs)和最大相关最小冗余(mRMR)方法来获取无偏、信息丰富且简洁的分子特征,随后采用机器学习方法对肺腺癌、肺鳞状细胞癌和小细胞肺癌进行分类。结果,一组16个DNA甲基化标记物在留一法交叉验证(LOOCV)和独立数据集测试实验中分别展现出理想的分类能力,准确率分别为86.54%、84.6%,召回率分别为84.37%、85.5%。此外,比较结果表明,在特征的信息性和简洁性方面,基于集成的特征选择方法与增量特征选择(IFS)策略相结合时优于单个方法。综上所述,所得结果表明基于集成的特征选择方法的有效性以及这三种类型肺癌组织中可能存在一组共同的DNA甲基化标记物,这将有助于临床诊断和治疗。