Pineda Arturo López, Ogoe Henry Ato, Balasubramanian Jeya Balaji, Rangel Escareño Claudia, Visweswaran Shyam, Herman James Gordon, Gopalakrishnan Vanathi
Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, 15206, Pittsburgh, PA, USA.
Department of Computational Genomics, National Institute of Genomic Medicine, Periferico Sur No. 4809, Col. Arenal Tepepan, Tlalpan, 14610, Mexico City, Mexico.
BMC Cancer. 2016 Mar 4;16:184. doi: 10.1186/s12885-016-2223-3.
Adenocarcinoma (ADC) and squamous cell carcinoma (SCC) are the most prevalent histological types among lung cancers. Distinguishing between these subtypes is critically important because they have different implications for prognosis and treatment. Normally, histopathological analyses are used to distinguish between the two, where the tissue samples are collected based on small endoscopic samples or needle aspirations. However, the lack of cell architecture in these small tissue samples hampers the process of distinguishing between the two subtypes. Molecular profiling can also be used to discriminate between the two lung cancer subtypes, on condition that the biopsy is composed of at least 50 % of tumor cells. However, for some cases, the tissue composition of a biopsy might be a mix of tumor and tumor-adjacent histologically normal tissue (TAHN). When this happens, a new biopsy is required, with associated cost, risks and discomfort to the patient. To avoid this problem, we hypothesize that a computational method can distinguish between lung cancer subtypes given tumor and TAHN tissue.
Using publicly available datasets for gene expression and DNA methylation, we applied four classification tasks, depending on the possible combinations of tumor and TAHN tissue. First, we used a feature selector (ReliefF/Limma) to select relevant variables, which were then used to build a simple naïve Bayes classification model. Then, we evaluated the classification performance of our models by measuring the area under the receiver operating characteristic curve (AUC). Finally, we analyzed the relevance of the selected genes using hierarchical clustering and IPA® software for gene functional analysis.
All Bayesian models achieved high classification performance (AUC > 0.94), which were confirmed by hierarchical cluster analysis. From the genes selected, 25 (93 %) were found to be related to cancer (19 were associated with ADC or SCC), confirming the biological relevance of our method.
The results from this study confirm that computational methods using tumor and TAHN tissue can serve as a prognostic tool for lung cancer subtype classification. Our study complements results from other studies where TAHN tissue has been used as prognostic tool for prostate cancer. The clinical implications of this finding could greatly benefit lung cancer patients.
腺癌(ADC)和鳞状细胞癌(SCC)是肺癌中最常见的组织学类型。区分这些亚型至关重要,因为它们对预后和治疗有不同的影响。通常,组织病理学分析用于区分两者,组织样本基于小的内镜样本或针吸活检采集。然而,这些小组织样本中缺乏细胞结构阻碍了区分这两种亚型的过程。分子谱分析也可用于区分两种肺癌亚型,前提是活检样本至少由50%的肿瘤细胞组成。然而,对于某些病例,活检的组织组成可能是肿瘤组织与肿瘤相邻的组织学正常组织(TAHN)的混合。当这种情况发生时,需要重新进行活检,这会给患者带来相关费用、风险和不适。为避免这个问题,我们假设一种计算方法可以在给定肿瘤组织和TAHN组织的情况下区分肺癌亚型。
利用公开可用的基因表达和DNA甲基化数据集,根据肿瘤组织和TAHN组织的可能组合应用了四项分类任务。首先,我们使用一个特征选择器(ReliefF/Limma)来选择相关变量,然后用这些变量构建一个简单的朴素贝叶斯分类模型。然后,我们通过测量受试者工作特征曲线(AUC)下的面积来评估我们模型的分类性能。最后,我们使用层次聚类和IPA®软件进行基因功能分析,分析所选基因的相关性。
所有贝叶斯模型均取得了较高的分类性能(AUC>0.94),这通过层次聚类分析得到了证实。从所选基因中发现,25个(93%)与癌症相关(19个与ADC或SCC相关),证实了我们方法的生物学相关性。
本研究结果证实,使用肿瘤组织和TAHN组织的计算方法可作为肺癌亚型分类的预后工具。我们的研究补充了其他将TAHN组织用作前列腺癌预后工具的研究结果。这一发现的临床意义可能会使肺癌患者受益匪浅。