Lin Qin, Sheng Jie, Zhou Chang, Xiao Tao, Meng Yilei, Lu Mingxin, Zhang Junfang, Yan Xueyun, Peng Luying, Cao Huaming, Li Li
State Key Laboratory of Cardiovascular Diseases and Medical Innovation Center, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai 200120, China.
Shanghai Arrhythmias Research Center, Shanghai East Hospital, Tongji University School of Medicine, Shanghai 200120, China.
Comput Struct Biotechnol J. 2025 May 29;27:2323-2335. doi: 10.1016/j.csbj.2025.05.050. eCollection 2025.
Long-noncoding RNAs (LncRNAs) play important roles in physiological and pathological processes. Accurately predicting lncRNA-protein interactions (LPIs) is vital strategy for clarify functions and pathogenic mechanisms of lncRNAs. Current computational methods for evaluating LPIs with their utility and generalization have significant room for improvement. In this study, data splitting by incorporating protein clusters as group information reveals that lots of LPI prediction methods suffer from generalization flaws due to data leakage caused by ignoring LPI biological properties. To address the issue, we present LPItabformer, a tabular Transformer framework for predicting LPIs, that incorporates a domain shifts with uncertainty (DSU) module for generalization enhancement. The LPItabformer demonstrates a capacity to alleviate the generalization challenges associated with biases in LPI data and preferences in protein binding patterns. In addition, LPItabformer shows greater robustness and generalization on human and mouse LPI datasets compared to state-of-the-art methods. Ultimately, we have verified that the LPItabformer is capable of predicting novel LPIs. Code is available at https://github.com/Ci-TJ/LPItabformer.
长链非编码RNA(LncRNAs)在生理和病理过程中发挥着重要作用。准确预测长链非编码RNA-蛋白质相互作用(LPI)是阐明长链非编码RNA功能和致病机制的关键策略。目前用于评估LPI及其效用和泛化能力的计算方法有很大的改进空间。在本研究中,通过将蛋白质簇作为分组信息进行数据拆分发现,许多LPI预测方法由于忽略LPI生物学特性导致数据泄露而存在泛化缺陷。为了解决这个问题,我们提出了LPItabformer,一种用于预测LPI的表格Transformer框架,它包含一个用于增强泛化能力的域转移不确定性(DSU)模块。LPItabformer展示了缓解与LPI数据偏差和蛋白质结合模式偏好相关的泛化挑战的能力。此外,与现有方法相比,LPItabformer在人类和小鼠LPI数据集上表现出更强的稳健性和泛化能力。最终,我们验证了LPItabformer能够预测新的LPI。代码可在https://github.com/Ci-TJ/LPItabformer获取。