Li Dan, Yang William, Zhang Yifan, Yang Jack Y, Guan Renchu, Xu Dong, Yang Mary Qu
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science & Technology, Jilin University, Changchun, 130012, China.
MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, AR, 72204, USA.
BMC Med Genomics. 2018 Nov 20;11(Suppl 5):106. doi: 10.1186/s12920-018-0413-3.
Non-small cell lung cancer (NSCLC) represents more than about 80% of the lung cancer. The early stages of NSCLC can be treated with complete resection with a good prognosis. However, most cases are detected at late stage of the disease. The average survival rate of the patients with invasive lung cancer is only about 4%. Adenocarcinoma in situ (AIS) is an intermediate subtype of lung adenocarcinoma that exhibits early stage growth patterns but can develop into invasion.
In this study, we used RNA-seq data from normal, AIS, and invasive lung cancer tissues to identify a gene module that represents the distinguishing characteristics of AIS as AIS-specific genes. Two differential expression analysis algorithms were employed to identify the AIS-specific genes. Then, the subset of the best performed AIS-specific genes for the early lung cancer prediction were selected by random forest. Finally, the performances of the early lung cancer prediction were assessed using random forest, support vector machine (SVM) and artificial neural networks (ANNs) on four independent early lung cancer datasets including one tumor-educated blood platelets (TEPs) dataset.
Based on the differential expression analysis, 107 AIS-specific genes that consisted of 93 protein-coding genes and 14 long non-coding RNAs (lncRNAs) were identified. The significant functions associated with these genes include angiogenesis and ECM-receptor interaction, which are highly related to cancer development and contribute to the smoking-free lung cancers. Moreover, 12 of the AIS-specific lncRNAs are involved in lung cancer progression by potentially regulating the ECM-receptor interaction pathway. The feature selection by random forest identified 20 of the AIS-specific genes as early stage lung cancer signatures using the dataset obtained from The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples. Of the 20 signatures, two were lncRNAs, BLACAT1 and CTD-2527I21.15 which have been reported to be associated with bladder cancer, colorectal cancer and breast cancer. In blind classification for three independent tissue sample datasets, these signature genes consistently yielded about 98% accuracy for distinguishing early stage lung cancer from normal cases. However, the prediction accuracy for the blood platelets samples was only 64.35% (sensitivity 78.1%, specificity 50.59%, and AUROC 0.747).
The comparison of AIS with normal and invasive tumor revealed diseases-specific genes and offered new insights into the mechanism underlying AIS progression into an invasive tumor. These genes can also serve as the signatures for early diagnosis of lung cancer with high accuracy. The expression profile of gene signatures identified from tissue cancer samples yielded remarkable early cancer prediction for tissues samples, however, relatively lower accuracy for boold platelets samples.
非小细胞肺癌(NSCLC)约占肺癌的80%以上。NSCLC的早期阶段可通过完全切除进行治疗,预后良好。然而,大多数病例在疾病晚期才被发现。侵袭性肺癌患者的平均生存率仅约为4%。原位腺癌(AIS)是肺腺癌的一种中间亚型,表现出早期生长模式,但可发展为侵袭性肿瘤。
在本研究中,我们使用来自正常、AIS和侵袭性肺癌组织的RNA测序数据,以识别一个代表AIS独特特征的基因模块,作为AIS特异性基因。采用两种差异表达分析算法来识别AIS特异性基因。然后,通过随机森林选择表现最佳的AIS特异性基因子集用于早期肺癌预测。最后,在包括一个肿瘤衍生血小板(TEP)数据集在内的四个独立早期肺癌数据集中,使用随机森林、支持向量机(SVM)和人工神经网络(ANN)评估早期肺癌预测的性能。
基于差异表达分析,鉴定出107个AIS特异性基因,其中包括93个蛋白质编码基因和14个长链非编码RNA(lncRNA)。与这些基因相关的重要功能包括血管生成和细胞外基质-受体相互作用,这与癌症发展高度相关,并促成非吸烟相关肺癌。此外,12个AIS特异性lncRNA可能通过调节细胞外基质-受体相互作用途径参与肺癌进展。通过随机森林进行特征选择,使用从癌症基因组图谱(TCGA)肺腺癌样本获得的数据集,鉴定出20个AIS特异性基因作为早期肺癌特征。在这20个特征中,有两个是lncRNA,即BLACAT1和CTD-2527I21.15,它们已被报道与膀胱癌、结直肠癌和乳腺癌有关。在对三个独立组织样本数据集的盲分类中,这些特征基因在区分早期肺癌与正常病例方面始终产生约98%的准确率。然而,对血小板样本的预测准确率仅为64.35%(敏感性78.1%,特异性50.59%,曲线下面积0.747)。
AIS与正常和侵袭性肿瘤的比较揭示了疾病特异性基因,并为AIS进展为侵袭性肿瘤的潜在机制提供了新见解。这些基因也可作为高精度早期诊断肺癌的特征。从组织癌样本中鉴定出的基因特征表达谱对组织样本产生了显著的早期癌症预测,但对血小板样本的准确率相对较低。