Department of Computer Science, University of Crete, Voutes Campus, Heraklion, 70013, Greece.
BMC Bioinformatics. 2018 Jan 23;19(1):17. doi: 10.1186/s12859-018-2023-7.
Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we extend established constrained-based, feature-selection methods to high-dimensional "omics" temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables.
The algorithm is able to return multiple, equivalent solution subsets of variables, scale to tens of thousands of features, and outperform or be on par with existing methods depending on the analysis task specifics.
The use of this algorithm is suggested for variable selection with high-dimensional temporal data.
特征选择常用于识别具有整体预测能力的生物标志物和生物特征;它有助于构建小的统计模型,这些模型更容易验证、可视化和理解,同时为人类专家提供深入的见解。在这项工作中,我们将已有的基于约束的特征选择方法扩展到高维“组学”时间数据,其中测量的数量比样本数量大几个数量级。扩展需要为时间和/或静态变量开发条件独立测试,这些变量取决于一组时间变量。
该算法能够返回多个等效的变量子集,能够扩展到数以万计的特征,并且根据分析任务的具体情况,其性能优于或等同于现有方法。
建议在具有高维时间数据的变量选择中使用此算法。