Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China.
Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen,518172, China.
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab479.
Advances in high-throughput experimental technologies promote the accumulation of vast number of biomedical data. Biomedical link prediction and single-cell RNA-sequencing (scRNA-seq) data imputation are two essential tasks in biomedical data analyses, which can facilitate various downstream studies and gain insights into the mechanisms of complex diseases. Both tasks can be transformed into matrix completion problems. For a variety of matrix completion tasks, matrix factorization has shown promising performance. However, the sparseness and high dimensionality of biomedical networks and scRNA-seq data have raised new challenges. To resolve these issues, various matrix factorization methods have emerged recently. In this paper, we present a comprehensive review on such matrix factorization methods and their usage in biomedical link prediction and scRNA-seq data imputation. Moreover, we select representative matrix factorization methods and conduct a systematic empirical comparison on 15 real data sets to evaluate their performance under different scenarios. By summarizing the experimental results, we provide general guidelines for selecting matrix factorization methods for different biomedical matrix completion tasks and point out some future directions to further improve the performance for biomedical link prediction and scRNA-seq data imputation.
高通量实验技术的进步推动了大量生物医学数据的积累。生物医学链接预测和单细胞 RNA 测序 (scRNA-seq) 数据插补是生物医学数据分析中的两个基本任务,它们可以促进各种下游研究,并深入了解复杂疾病的机制。这两个任务都可以转化为矩阵补全问题。对于各种矩阵补全任务,矩阵分解已显示出很有前景的性能。然而,生物医学网络和 scRNA-seq 数据的稀疏性和高维度性带来了新的挑战。为了解决这些问题,最近出现了各种矩阵分解方法。在本文中,我们全面回顾了这些矩阵分解方法及其在生物医学链接预测和 scRNA-seq 数据插补中的应用。此外,我们选择了具有代表性的矩阵分解方法,并在 15 个真实数据集上进行了系统的实证比较,以评估它们在不同场景下的性能。通过总结实验结果,我们为不同的生物医学矩阵补全任务选择矩阵分解方法提供了一般指导,并指出了一些进一步提高生物医学链接预测和 scRNA-seq 数据插补性能的未来方向。