Xu Chaohan, Qi Rui, Ping Yanyan, Li Jie, Zhao Hongying, Wang Li, Du Michael Yifei, Xiao Yun, Li Xia
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China.
Weston High School of Massachusetts, Massachusetts, USA.
Oncotarget. 2017 Feb 14;8(7):12041-12051. doi: 10.18632/oncotarget.14510.
LncRNAs have emerged as a major class of regulatory molecules involved in normal cellular physiology and disease, our knowledge of lncRNAs is very limited and it has become a major research challenge in discovering novel disease-related lncRNAs in cancers. Based on the assumption that diverse diseases with similar phenotype associations show similar molecular mechanisms, we presented a pan-cancer network-based prioritization approach to systematically identify disease-specific risk lncRNAs by integrating disease phenotype associations. We applied this strategy to approximately 2800 tumor samples from 14 cancer types for prioritizing disease risk lncRNAs. Our approach yielded an average area under the ROC curve (AUC) of 80.66%, with the highest AUC (98.14%) for medulloblastoma. When evaluated using leave-one-out cross-validation (LOOCV) for prioritization of disease candidate genes, the average AUC score of 97.16% was achieved. Moreover, we demonstrated the robustness as well as the integrative importance of this approach, including disease phenotype associations, known disease genes and the numbers of cancer types. Taking glioblastoma multiforme as a case study, we identified a candidate lncRNA gene SNHG1 as a novel disease risk factor for disease diagnosis and prognosis. In summary, we provided a novel lncRNA prioritization approach by integrating pan-cancer phenotype associations that could help researchers better understand the important roles of lncRNAs in human cancers.
长链非编码RNA(lncRNAs)已成为参与正常细胞生理和疾病的一类主要调控分子,但我们对lncRNAs的了解非常有限,在癌症中发现与疾病相关的新型lncRNAs已成为一项重大研究挑战。基于具有相似表型关联的多种疾病显示相似分子机制这一假设,我们提出了一种基于泛癌网络的优先级排序方法,通过整合疾病表型关联来系统地识别疾病特异性风险lncRNAs。我们将此策略应用于来自14种癌症类型的约2800个肿瘤样本,以对疾病风险lncRNAs进行优先级排序。我们的方法得到的ROC曲线下面积(AUC)平均为80.66%,其中髓母细胞瘤的AUC最高(98.14%)。当使用留一法交叉验证(LOOCV)对疾病候选基因进行优先级排序时,平均AUC得分为97.16%。此外,我们证明了该方法的稳健性以及综合重要性,包括疾病表型关联、已知疾病基因和癌症类型数量。以多形性胶质母细胞瘤为例,我们鉴定出一个候选lncRNA基因SNHG1作为疾病诊断和预后的新型疾病风险因素。总之,我们通过整合泛癌表型关联提供了一种新型的lncRNA优先级排序方法,这有助于研究人员更好地理解lncRNAs在人类癌症中的重要作用。