Yao Fang, Zhang Chi, Du Wei, Liu Chao, Xu Ying
Key Laboratory for Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China; Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens, United States of America; Jilin Teachers' Institute of Engineering and Technology, Changchun, China.
Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens, United States of America.
PLoS One. 2015 Sep 16;10(9):e0138213. doi: 10.1371/journal.pone.0138213. eCollection 2015.
The grade of a cancer is a measure of the cancer's malignancy level, and the stage of a cancer refers to the size and the extent that the cancer has spread. Here we present a computational method for prediction of gene signatures and blood/urine protein markers for breast cancer grades and stages based on RNA-seq data, which are retrieved from the TCGA breast cancer dataset and cover 111 pairs of disease and matching adjacent noncancerous tissues with pathologists-assigned stages and grades. By applying a differential expression and an SVM-based classification approach, we found that 324 and 227 genes in cancer have their expression levels consistently up-regulated vs. their matching controls in a grade- and stage-dependent manner, respectively. By using these genes, we predicted a 9-gene panel as a gene signature for distinguishing poorly differentiated from moderately and well differentiated breast cancers, and a 19-gene panel as a gene signature for discriminating between the moderately and well differentiated breast cancers. Similarly, a 30-gene panel and a 21-gene panel are predicted as gene signatures for distinguishing advanced stage (stages III-IV) from early stage (stages I-II) cancer samples and for distinguishing stage II from stage I samples, respectively. We expect these gene panels can be used as gene-expression signatures for cancer grade and stage classification. In addition, of the 324 grade-dependent genes, 188 and 66 encode proteins that are predicted to be blood-secretory and urine-excretory, respectively; and of the 227 stage-dependent genes, 123 and 51 encode proteins predicted to be blood-secretory and urine-excretory, respectively. We anticipate that some combinations of these blood and urine proteins could serve as markers for monitoring breast cancer at specific grades and stages through blood and urine tests.
癌症的分级是衡量癌症恶性程度的指标,而癌症的分期则指癌症的大小及扩散程度。在此,我们基于RNA测序数据,提出一种计算方法,用于预测乳腺癌分级和分期的基因特征以及血液/尿液蛋白质标志物。这些数据取自TCGA乳腺癌数据集,涵盖111对疾病样本及匹配的相邻非癌组织,并附有病理学家指定的分期和分级。通过应用差异表达和基于支持向量机的分类方法,我们发现癌症中分别有324个和227个基因的表达水平相对于其匹配的对照,呈现出与分级和分期相关的一致上调。利用这些基因,我们预测了一个9基因组合作为区分低分化与中高分化乳腺癌的基因特征,以及一个19基因组合作为区分中高分化乳腺癌的基因特征。同样,分别预测了一个30基因组合和一个21基因组合作为区分晚期(III-IV期)与早期(I-II期)癌症样本以及区分II期与I期样本的基因特征。我们期望这些基因组合能够用作癌症分级和分期分类的基因表达特征。此外,在324个与分级相关的基因中,分别有188个和66个基因编码预测为血液分泌型和尿液排泄型的蛋白质;在227个与分期相关的基因中,分别有123个和51个基因编码预测为血液分泌型和尿液排泄型的蛋白质。我们预计这些血液和尿液蛋白质的某些组合可以作为通过血液和尿液检测监测特定分级和分期乳腺癌的标志物。