Department of Veterinary and Animal Sciences, Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg C, Denmark.
Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen N, Denmark.
Bioinformatics. 2019 May 1;35(9):1494-1502. doi: 10.1093/bioinformatics/bty859.
Long non-coding RNAs (lncRNAs) are important regulators in wide variety of biological processes, which are linked to many diseases. Compared to protein-coding genes (PCGs), the association between diseases and lncRNAs is still not well studied. Thus, inferring disease-associated lncRNAs on a genome-wide scale has become imperative.
In this study, we propose a machine learning-based method, DislncRF, which infers disease-associated lncRNAs on a genome-wide scale based on tissue expression profiles. DislncRF uses random forest models trained on expression profiles of known disease-associated PCGs across human tissues to extract general patterns between expression profiles and diseases. These models are then applied to score associations between lncRNAs and diseases. DislncRF was benchmarked against a gold standard dataset and compared to other methods. The results show that DislncRF yields promising performance and outperforms the existing methods. The utility of DislncRF is further substantiated on two diseases in which we find that top scoring candidates are supported by literature or independent datasets.
https://github.com/xypan1232/DislncRF.
Supplementary data are available at Bioinformatics online.
长非编码 RNA(lncRNA)是多种生物过程中的重要调节因子,与许多疾病有关。与编码蛋白的基因(PCG)相比,疾病与 lncRNA 之间的关联仍未得到很好的研究。因此,在全基因组范围内推断与疾病相关的 lncRNA 已变得至关重要。
在这项研究中,我们提出了一种基于机器学习的方法 DislncRF,该方法基于组织表达谱在全基因组范围内推断与疾病相关的 lncRNA。DislncRF 使用基于人类组织中已知与疾病相关的 PCG 的表达谱训练的随机森林模型,从表达谱和疾病之间提取一般模式。然后,将这些模型应用于评分 lncRNA 和疾病之间的关联。DislncRF 与黄金标准数据集进行了基准测试,并与其他方法进行了比较。结果表明,DislncRF 具有有前景的性能,优于现有方法。DislncRF 在两种疾病中的应用进一步证明了其有效性,我们在这两种疾病中发现,得分最高的候选物得到了文献或独立数据集的支持。
https://github.com/xypan1232/DislncRF。
补充数据可在生物信息学在线获得。