College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad097.
Gene essentiality is defined as the extent to which a gene is required for the survival and reproductive success of a living system. It can vary between genetic backgrounds and environments. Essential protein coding genes have been well studied. However, the essentiality of non-coding regions is rarely reported. Most regions of human genome do not encode proteins. Determining essentialities of non-coding genes is demanded. We developed iEssLnc models, which can assign essentiality scores to lncRNA genes. As far as we know, this is the first direct quantitative estimation to the essentiality of lncRNA genes. By taking the advantage of graph neural network with meta-path-guided random walks on the lncRNA-protein interaction network, iEssLnc models can perform genome-wide screenings for essential lncRNA genes in a quantitative manner. We carried out validations and whole genome screening in the context of human cancer cell-lines and mouse genome. In comparisons to other methods, which are transferred from protein-coding genes, iEssLnc achieved better performances. Enrichment analysis indicated that iEssLnc essentiality scores clustered essential lncRNA genes with high ranks. With the screening results of iEssLnc models, we estimated the number of essential lncRNA genes in human and mouse. We performed functional analysis to find that essential lncRNA genes interact with microRNAs and cytoskeletal proteins significantly, which may be of interest in experimental life sciences. All datasets and codes of iEssLnc models have been deposited in GitHub (https://github.com/yyZhang14/iEssLnc).
基因的必需性是指一个基因对于一个生物系统的生存和繁殖成功所必需的程度。它可以在遗传背景和环境之间发生变化。已有大量研究针对必需的蛋白质编码基因,然而,非编码区域的必需性却很少被报道。人类基因组的大多数区域不编码蛋白质。因此,需要确定非编码基因的必需性。我们开发了 iEssLnc 模型,可以为 lncRNA 基因分配必需性评分。据我们所知,这是首次对 lncRNA 基因的必需性进行直接的定量估计。通过利用基于元路径引导随机游走的图神经网络,iEssLnc 模型可以在全基因组范围内以定量的方式筛选必需的 lncRNA 基因。我们在人类癌细胞系和小鼠基因组的背景下进行了验证和全基因组筛选。与其他从蛋白质编码基因转移而来的方法相比,iEssLnc 取得了更好的性能。富集分析表明,iEssLnc 的必需性评分将高排名的必需 lncRNA 基因聚类在一起。通过 iEssLnc 模型的筛选结果,我们估计了人类和小鼠中必需的 lncRNA 基因数量。我们进行了功能分析,发现必需的 lncRNA 基因与 microRNAs 和细胞骨架蛋白相互作用显著,这可能对实验生命科学感兴趣。iEssLnc 模型的所有数据集和代码都已存储在 GitHub(https://github.com/yyZhang14/iEssLnc)中。