IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1639-1647. doi: 10.1109/TCBB.2019.2907536. Epub 2019 Mar 26.
Accurate prioritization of potential disease genes is a fundamental challenge in biomedical research. Various algorithms have been developed to solve such problems. Inductive Matrix Completion (IMC) is one of the most reliable models for its well-established framework and its superior performance in predicting gene-disease associations. However, the IMC method does not hierarchically extract deep features, which might limit the quality of recovery. In this case, the architecture of deep learning, which obtains high-level representations and handles noises and outliers presented in large-scale biological datasets, is introduced into the side information of genes in our Deep Collaborative Filtering (DCF) model. Further, for lack of negative examples, we also exploit Positive-Unlabeled (PU) learning formulation to low-rank matrix completion. Our approach achieves substantially improved performance over other state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database. Our approach is 10 percent more efficient than standard IMC in detecting a true association, and significantly outperforms other alternatives in terms of the precision-recall metric at the top-k predictions. Moreover, we also validate the disease with no previously known gene associations and newly reported OMIM associations. The experimental results show that DCF is still satisfactory for ranking novel disease phenotypes as well as mining unexplored relationships. The source code and the data are available at https://github.com/xzenglab/DCF.
准确地确定潜在疾病基因的优先级是生物医学研究中的一个基本挑战。已经开发了各种算法来解决这些问题。归纳矩阵补全(IMC)是最可靠的模型之一,因为它具有成熟的框架和在预测基因-疾病关联方面的卓越性能。然而,IMC 方法没有分层提取深层特征,这可能会限制恢复的质量。在这种情况下,深度学习的架构被引入到我们的深度协同过滤(DCF)模型的基因侧信息中,该架构可以获取高级表示,并处理大规模生物数据集呈现的噪声和异常值。此外,由于缺乏负例,我们还利用正未标记(PU)学习公式进行低秩矩阵补全。我们的方法在 OMIM 数据库中的疾病上的表现明显优于其他最先进的方法。与标准 IMC 相比,我们的方法在检测真实关联方面的效率提高了 10%,并且在 top-k 预测方面的精度-召回率指标上明显优于其他替代方法。此外,我们还验证了以前没有已知基因关联的疾病和新报告的 OMIM 关联。实验结果表明,DCF 在对新的疾病表型进行排序以及挖掘未开发的关系方面仍然令人满意。源代码和数据可在 https://github.com/xzenglab/DCF 上获得。