Zhang Silu, Wang Junqing, Ghoshal Torumoy, Wilkins Dawn, Mo Yin-Yuan, Chen Yixin, Zhou Yunyun
Department of Computer and Information Science, University of Mississippi, Oxford, MS 38677, USA.
Department of Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 20025, China.
Genes (Basel). 2018 Jan 26;9(2):65. doi: 10.3390/genes9020065.
Breast cancer is intrinsically heterogeneous and is commonly classified into four main subtypes associated with distinct biological features and clinical outcomes. However, currently available data resources and methods are limited in identifying molecular subtyping on protein-coding genes, and little is known about the roles of long non-coding RNAs (lncRNAs), which occupies 98% of the whole genome. lncRNAs may also play important roles in subgrouping cancer patients and are associated with clinical phenotypes. The purpose of this project was to identify lncRNA gene signatures that are associated with breast cancer subtypes and clinical outcomes. We identified lncRNA gene signatures from The Cancer Genome Atlas (TCGA )RNAseq data that are associated with breast cancer subtypes by an optimized 1-Norm SVM feature selection algorithm. We evaluated the prognostic performance of these gene signatures with a semi-supervised principal component (superPC) method. Although lncRNAs can independently predict breast cancer subtypes with satisfactory accuracy, a combined gene signature including both coding and non-coding genes will give the best clinically relevant prediction performance. We highlighted eight potential biomarkers (three from coding genes and five from non-coding genes) that are significantly associated with survival outcomes. Our proposed methods are a novel means of identifying subtype-specific coding and non-coding potential biomarkers that are both clinically relevant and biologically significant.
乳腺癌本质上具有异质性,通常分为四种主要亚型,这些亚型具有不同的生物学特征和临床结局。然而,目前可用的数据资源和方法在识别蛋白质编码基因的分子亚型方面存在局限性,对于占整个基因组98%的长链非编码RNA(lncRNA)的作用了解甚少。lncRNA在癌症患者亚组分类中可能也发挥着重要作用,并且与临床表型相关。本项目的目的是识别与乳腺癌亚型和临床结局相关的lncRNA基因特征。我们通过优化的1-范数支持向量机(SVM)特征选择算法,从癌症基因组图谱(TCGA)RNA测序数据中识别出与乳腺癌亚型相关的lncRNA基因特征。我们使用半监督主成分(superPC)方法评估了这些基因特征的预后性能。虽然lncRNA能够以令人满意的准确性独立预测乳腺癌亚型,但包含编码基因和非编码基因的联合基因特征将给出最佳的临床相关预测性能。我们重点介绍了八个与生存结局显著相关的潜在生物标志物(三个来自编码基因,五个来自非编码基因)。我们提出的方法是一种识别亚型特异性编码和非编码潜在生物标志物的新手段,这些生物标志物在临床和生物学上均具有重要意义。