家族排序：一种基于图形领域知识的特征排序算法。

Family Rank: a graphical domain knowledge informed feature ranking algorithm.

作者信息

Saul Michelle, Dinu Valentin

机构信息

College of Health Solutions, Arizona State University, Tempe, AZ 85287-9020, USA.

Caris Life Sciences, Tempe, AZ 85281, USA.

出版信息

Bioinformatics. 2021 Oct 25;37(20):3626-3631. doi: 10.1093/bioinformatics/btab387.

DOI:10.1093/bioinformatics/btab387

PMID:34009295

Abstract

MOTIVATION

When designing prediction models built with many features and relatively small sample sizes, feature selection methods often overfit training data, leading to selection of irrelevant features. One way to potentially mitigate overfitting is to incorporate domain knowledge during feature selection. Here, a feature ranking algorithm called 'Family Rank' is presented in which features are ranked based on a combination of graphical domain knowledge and feature scores computed from empirical data.

RESULTS

A simulated dataset is used to demonstrate a scenario in which family rank outperforms other state-of-the-art graph based ranking algorithms, decreasing the sample size needed to detect true predictors by 2- to 3-fold. An example from oncology is then used to explore a real-world application of family rank.

AVAILABILITY AND IMPLEMENTATION

An implementation of Family Rank is freely available at https://cran.r-project.org/package=FamilyRank.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在设计由许多特征和相对较小样本量构建的预测模型时，特征选择方法常常会过度拟合训练数据，导致选择出不相关的特征。一种可能减轻过度拟合的方法是在特征选择过程中纳入领域知识。在此，提出了一种名为“家族排序”的特征排序算法，其中特征是基于图形领域知识和从经验数据计算得出的特征分数的组合进行排序的。

结果

使用一个模拟数据集来展示一种情况，即家族排序优于其他基于图形的先进排序算法，将检测真实预测因子所需的样本量减少了2至3倍。然后使用肿瘤学中的一个例子来探索家族排序的实际应用。

可用性与实现

家族排序的实现可在https://cran.r-project.org/package=FamilyRank上免费获取。

补充信息

补充数据可在《生物信息学》在线版获取。

相似文献

Family Rank: a graphical domain knowledge informed feature ranking algorithm.家族排序：一种基于图形领域知识的特征排序算法。

Bioinformatics. 2021 Oct 25;37(20):3626-3631. doi: 10.1093/bioinformatics/btab387.

An empirical Bayesian ranking method, with applications to high throughput biology.经验贝叶斯排序方法及其在高通量生物学中的应用。

Bioinformatics. 2020 Jan 1;36(1):177-185. doi: 10.1093/bioinformatics/btz471.

selectBoost: a general algorithm to enhance the performance of variable selection methods.选择提升：一种增强变量选择方法性能的通用算法。

Bioinformatics. 2021 May 5;37(5):659-668. doi: 10.1093/bioinformatics/btaa855.

RMTL: an R library for multi-task learning.RMTL：一个用于多任务学习的 R 库。

Bioinformatics. 2019 May 15;35(10):1797-1798. doi: 10.1093/bioinformatics/bty831.

Partition: a surjective mapping approach for dimensionality reduction.分区：一种用于降维的满射映射方法。

Bioinformatics. 2020 Feb 1;36(3):676-681. doi: 10.1093/bioinformatics/btz661.

EPX: An R package for the ensemble of subsets of variables for highly unbalanced binary classification.EPX：用于高度不平衡二分类的变量子集集成的 R 包。

Comput Biol Med. 2021 Sep;136:104760. doi: 10.1016/j.compbiomed.2021.104760. Epub 2021 Aug 13.

Bayesian network feature finder (BANFF): an R package for gene network feature selection.贝叶斯网络特征查找器（BANFF）：一个用于基因网络特征选择的R包。

Bioinformatics. 2016 Dec 1;32(23):3685-3687. doi: 10.1093/bioinformatics/btw522. Epub 2016 Aug 8.

A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification.基于规则和质心单样本多类预测器在转录组分类中的比较。

Bioinformatics. 2022 Jan 27;38(4):1022-1029. doi: 10.1093/bioinformatics/btab763.

Systematic comparison of ranking aggregation methods for gene lists in experimental results.系统比较实验结果中基因列表的排名聚合方法。

Bioinformatics. 2022 Oct 31;38(21):4927-4933. doi: 10.1093/bioinformatics/btac621.

multiclassPairs: an R package to train multiclass pair-based classifier.multiclassPairs：一个用于训练多类基于对的分类器的 R 包。

Bioinformatics. 2021 Sep 29;37(18):3043-3044. doi: 10.1093/bioinformatics/btab088.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

家族排序：一种基于图形领域知识的特征排序算法。

Family Rank: a graphical domain knowledge informed feature ranking algorithm.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性与实现

补充信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献