Liu Zihao, Zhang Ying, Han Xudong, Li Chenxi, Yang Xuhui, Gao Jie, Xie Ganfeng, Du Nan
Department of Oncology, Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, China.
Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China.
Front Cell Dev Biol. 2020 Aug 11;8:637. doi: 10.3389/fcell.2020.00637. eCollection 2020.
Millions of people are suffering from cancers, but accurate early diagnosis and effective treatment are still tough for all doctors. In recent years, long non-coding RNAs (lncRNAs) have been proven to play an important role in diseases, especially cancers. These lncRNAs execute their functions by regulating gene expression. Therefore, identifying lncRNAs which are related to cancers could help researchers gain a deeper understanding of cancer mechanisms and help them find treatment options. A large number of relationships between lncRNAs and cancers have been verified by biological experiments, which give us a chance to use computational methods to identify cancer-related lncRNAs. In this paper, we applied the convolutional neural network (CNN) to identify cancer-related lncRNAs by lncRNA's target genes and their tissue expression specificity. Since lncRNA regulates target gene expression and it has been reported to have tissue expression specificity, their target genes and expression in different tissues were used as features of lncRNAs. Then, the deep belief network (DBN) was used to unsupervised encode features of lncRNAs. Finally, CNN was used to predict cancer-related lncRNAs based on known relationships between lncRNAs and cancers. For each type of cancer, we built a CNN model to predict its related lncRNAs. We identified more related lncRNAs for 41 kinds of cancers. Ten-cross validation has been used to prove the performance of our method. The results showed that our method is better than several previous methods with area under the curve (AUC) 0.81 and area under the precision-recall curve (AUPR) 0.79. To verify the accuracy of our results, case studies have been done.
数以百万计的人正遭受癌症的折磨,但对所有医生来说,准确的早期诊断和有效的治疗仍然困难重重。近年来,长链非编码RNA(lncRNA)已被证明在疾病尤其是癌症中发挥着重要作用。这些lncRNA通过调节基因表达来执行其功能。因此,识别与癌症相关的lncRNA有助于研究人员更深入地了解癌症机制,并帮助他们找到治疗方案。大量lncRNA与癌症之间的关系已通过生物学实验得到验证,这为我们利用计算方法识别癌症相关lncRNA提供了机会。在本文中,我们应用卷积神经网络(CNN)通过lncRNA的靶基因及其组织表达特异性来识别癌症相关lncRNA。由于lncRNA调节靶基因表达且据报道具有组织表达特异性,因此将它们的靶基因和在不同组织中的表达用作lncRNA的特征。然后,使用深度信念网络(DBN)对lncRNA的特征进行无监督编码。最后,基于lncRNA与癌症之间的已知关系,使用CNN预测癌症相关lncRNA。对于每种癌症类型,我们构建了一个CNN模型来预测其相关的lncRNA。我们为41种癌症识别出了更多相关的lncRNA。采用十折交叉验证来证明我们方法的性能。结果表明,我们的方法优于之前的几种方法,曲线下面积(AUC)为0.81,精确召回率曲线下面积(AUPR)为0.79。为了验证我们结果的准确性,已进行了案例研究。