Xie Guobo, Li Dayin, Lin Zhiyi, Gu Guosheng, Li Weijun, Chen Ruibin, Liu Zhenguo
School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China.
2MD Department of Thoracic Surgery, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou 510080, China.
J Chem Inf Model. 2024 Dec 23;64(24):9594-9608. doi: 10.1021/acs.jcim.4c01070. Epub 2024 Jul 26.
Existing matrix factorization methods face challenges, including the cold start problem and global nonlinear data loss during similarity learning, particularly in predicting associations between long noncoding RNAs (LncRNAs) and diseases. To overcome these issues, we introduce HPTRMF, a matrix factorization approach incorporating high-order perturbation and flexible trifactor regularization. HPTRMF constructs a high-order correlation matrix utilizing the known association matrix, leveraging high-order perturbation to effectively address the cold start problem caused by data sparsity. Additionally, HPTRMF incorporates a flexible trifactor regularization term to capture similarity information on LncRNAs and diseases, enabling the effective handling of global nonlinear data loss by capturing such data in the similarity matrix. Experimental results demonstrate the superiority of HPTRMF over nine state-of-the-art algorithms in Leave-One-Out Cross-Validation (LOOCV) and Five-Fold Cross-Validation (5-Fold CV) on three data sets.HPTRMF and data sets are available in https://github.com/Llvvvv/HPTRMF.
现有的矩阵分解方法面临诸多挑战,包括冷启动问题以及相似性学习过程中的全局非线性数据丢失问题,尤其是在预测长链非编码RNA(LncRNA)与疾病之间的关联时。为克服这些问题,我们引入了HPTRMF,一种结合高阶扰动和灵活三因子正则化的矩阵分解方法。HPTRMF利用已知的关联矩阵构建高阶相关矩阵,借助高阶扰动有效解决数据稀疏导致的冷启动问题。此外,HPTRMF纳入了一个灵活的三因子正则化项,以捕获LncRNA和疾病的相似性信息,通过在相似性矩阵中捕获此类数据,能够有效处理全局非线性数据丢失问题。实验结果表明,在三个数据集上的留一法交叉验证(LOOCV)和五折交叉验证(5折CV)中,HPTRMF优于九种最先进的算法。HPTRMF和数据集可在https://github.com/Llvvvv/HPTRMF获取。