Lieber Institute for Brain Development, Baltimore, MD 21205, United States.
Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, United States.
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad513.
Advances in technology have generated larger omics datasets with potential applications for machine learning. In many datasets, however, cost and limited sample availability result in an excessively higher number of features as compared to observations. Moreover, biological processes are associated with networks of core and peripheral genes, while traditional feature selection approaches capture only core genes.
To overcome these limitations, we present dRFEtools that implements dynamic recursive feature elimination (RFE), reducing computational time with high accuracy compared to standard RFE, expanding dynamic RFE to regression algorithms, and outputting the subsets of features that hold predictive power with and without peripheral features. dRFEtools integrates with scikit-learn (the popular Python machine learning platform) and thus provides new opportunities for dynamic RFE in large-scale omics data while enhancing its interpretability.
dRFEtools is freely available on PyPI at https://pypi.org/project/drfetools/ or on GitHub https://github.com/LieberInstitute/dRFEtools, implemented in Python 3, and supported on Linux, Windows, and Mac OS.
技术的进步产生了具有潜在机器学习应用的更大规模组学数据集。然而,在许多数据集中,成本和有限的样本可用性导致特征数量相对于观测值过高。此外,生物过程与核心和外围基因网络相关联,而传统的特征选择方法仅捕获核心基因。
为了克服这些限制,我们提出了 dRFEtools,它实现了动态递归特征消除 (RFE),与标准 RFE 相比,提高了准确性并降低了计算时间,将动态 RFE 扩展到回归算法,并输出具有和不具有外围特征的具有预测能力的特征子集。dRFEtools 与 scikit-learn(流行的 Python 机器学习平台)集成,从而为大规模组学数据中的动态 RFE 提供了新的机会,同时提高了其可解释性。
dRFEtools 可在 PyPI 上免费获得,网址为 https://pypi.org/project/drfetools/ 或在 GitHub 上获得,网址为 https://github.com/LieberInstitute/dRFEtools,它是用 Python 3 实现的,支持 Linux、Windows 和 Mac OS。