Xie Yue, Jing Zehua, Pan Hailin, Xu Xun, Fang Qi
College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
BGI Research, Shenzhen, 518083, China.
BMC Bioinformatics. 2025 Apr 15;26(1):104. doi: 10.1186/s12859-025-06112-5.
Single-cell RNA sequencing allows for the exploration of transcriptomic features at the individual cell level, but the high dimensionality and sparsity of the data pose substantial challenges for downstream analysis. Feature selection, therefore, is a critical step to reduce dimensionality and enhance interpretability.
We developed a robust feature selection algorithm that leverages optimized locally estimated scatterplot smoothing regression (LOESS) to precisely capture the relationship between gene average expression level and positive ratio while minimizing overfitting. Our evaluations showed that our algorithm consistently outperforms eight leading feature selection methods across three benchmark criteria and helps improve downstream analysis, thus offering a significant improvement in gene subset selection.
By preserving key biological information through feature selection, GLP provides informative features to enhance the accuracy and effectiveness of downstream analyses.
单细胞RNA测序能够在单个细胞水平上探索转录组特征,但数据的高维度和稀疏性给下游分析带来了巨大挑战。因此,特征选择是降低维度和增强可解释性的关键步骤。
我们开发了一种强大的特征选择算法,该算法利用优化的局部估计散点图平滑回归(LOESS)来精确捕捉基因平均表达水平与阳性率之间的关系,同时将过拟合降至最低。我们的评估表明,在三个基准标准上,我们的算法始终优于八种领先的特征选择方法,并有助于改进下游分析,从而在基因子集选择方面有显著提升。
通过特征选择保留关键生物学信息,GLP提供了信息丰富的特征,以提高下游分析的准确性和有效性。