School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.
Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China.
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad530.
A single gene may yield several isoforms with different functions through alternative splicing. Continuous efforts are devoted to developing machine-learning methods to predict isoform functions. However, existing methods do not consider the relevance of each feature to specific functions and ignore the noise caused by the irrelevant features. In this case, we hypothesize that constructing a feature selection framework to extract the function-relevant features might help improve the model accuracy in isoform function prediction.
In this article, we present a feature selection-based approach named IsoFrog to predict isoform functions. First, IsoFrog adopts a reversible jump Markov Chain Monte Carlo (RJMCMC)-based feature selection framework to assess the feature importance to gene functions. Second, a sequential feature selection procedure is applied to select a subset of function-relevant features. This strategy screens the relevant features for the specific function while eliminating irrelevant ones, improving the effectiveness of the input features. Then, the selected features are input into our proposed method modified domain-invariant partial least squares, which prioritizes the most likely positive isoform for each positive MIG and utilizes diPLS for isoform function prediction. Tested on three datasets, our method achieves superior performance over six state-of-the-art methods, and the RJMCMC-based feature selection framework outperforms three classic feature selection methods. We expect this proposed methodology will promote the identification of isoform functions and further inspire the development of new methods.
IsoFrog is freely available at https://github.com/genemine/IsoFrog.
一个基因可以通过选择性剪接产生具有不同功能的几种异构体。人们一直在努力开发机器学习方法来预测异构体的功能。然而,现有的方法没有考虑到每个特征与特定功能的相关性,也忽略了不相关特征带来的噪声。在这种情况下,我们假设构建一个特征选择框架来提取与功能相关的特征可能有助于提高异构体功能预测模型的准确性。
在本文中,我们提出了一种基于特征选择的方法 IsoFrog 来预测异构体的功能。首先,IsoFrog 采用基于可逆跳跃马尔可夫链蒙特卡罗(RJMCMC)的特征选择框架来评估特征对基因功能的重要性。其次,应用了一种顺序特征选择过程来选择一组与功能相关的特征。这种策略在筛选特定功能相关特征的同时消除不相关特征,从而提高输入特征的有效性。然后,将选择的特征输入到我们提出的方法中,即改进的域不变偏最小二乘(diPLS),该方法优先考虑每个阳性 MIG 中最有可能的阳性异构体,并利用 diPLS 进行异构体功能预测。在三个数据集上进行测试,我们的方法在六种最先进的方法中表现出优异的性能,并且基于 RJMCMC 的特征选择框架优于三种经典的特征选择方法。我们希望这种提出的方法将促进异构体功能的识别,并进一步激发新方法的发展。
IsoFrog 可在 https://github.com/genemine/IsoFrog 上免费获得。