Sherafatian Masih, Arjmand Fateme
Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran 14115-111, Iran.
Department of Genetics and Molecular Medicine, Zanjan University of Medical Sciences, Zanjan 45139-56184, Iran.
Oncol Lett. 2019 Aug;18(2):2125-2131. doi: 10.3892/ol.2019.10462. Epub 2019 Jun 10.
Lung cancer has the world's highest cancer- associated mortality rate, making biomarker discovery for this cancer a pressing issue. Machine learning approaches to identify molecular biomarkers are not as prevalent as screening of potential biomarkers by differential expression analysis. However, several differentially expressed miRNAs involved in cancer have been identified using this approach. The availability of The Cancer Genome Atlas (TCGA) allows the use of machine-learning methods for the molecular profiling of tumors. The present study employed empirical negative control microRNAs (miRs) in lung cancer to normalize lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) datasets from TCGA to model decision trees in order to classify lung cancer status and subtype. The two primary classification models consisted of four miRNAs for lung cancer diagnosis and subtyping. hsa-miR-183 and hsa-miR-135b were used to distinguish lung tumors from normal samples taken from tissues adjacent to the tumor site, and hsa-miR-944 and hsa-miR-205 to further classify the tumors into LUAD and LUSC major subtypes. Specific cancer status classification models were also presented for each subtype.
肺癌是全球癌症相关死亡率最高的癌症,因此发现这种癌症的生物标志物成为一个紧迫的问题。通过机器学习方法识别分子生物标志物并不像通过差异表达分析筛选潜在生物标志物那样普遍。然而,利用这种方法已经鉴定出了几种与癌症相关的差异表达微小RNA(miRNA)。癌症基因组图谱(TCGA)的可用性使得能够使用机器学习方法对肿瘤进行分子分析。本研究采用肺癌中的经验性阴性对照微小RNA(miR)对来自TCGA的肺腺癌(LUAD)和肺鳞状细胞癌(LUSC)数据集进行标准化,以构建决策树模型,从而对肺癌状态和亚型进行分类。这两个主要分类模型由用于肺癌诊断和亚型分类的四种微小RNA组成。hsa-miR-183和hsa-miR-135b用于区分肺肿瘤与取自肿瘤部位相邻组织的正常样本,而hsa-miR-944和hsa-miR-205则用于将肿瘤进一步分为LUAD和LUSC主要亚型。还针对每个亚型提出了特定的癌症状态分类模型。