基于生物领域知识的特征选择在基因表达数据中的应用。

Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data.

作者信息

Yousef Malik, Kumar Abhishek, Bakir-Gungor Burcu

机构信息

Department of Information Systems, Zefat Academic College, Zefat 13206, Israel.

Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat 13206, Israel.

出版信息

Entropy (Basel). 2020 Dec 22;23(1):2. doi: 10.3390/e23010002.

DOI:10.3390/e23010002

PMID:33374969

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7821996/

Abstract

In the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. One of the main goals of this review is to explore the existing methods that integrate different types of information in order to improve the identification of the biomolecular signatures of diseases and the discovery of new potential targets for treatment. These integrative approaches are expected to aid the prediction, diagnosis, and treatment of diseases, as well as to enlighten us on disease state dynamics, mechanisms of their onset and progression. The integration of various types of biological information will necessitate the development of novel techniques for integration and data analysis. Another aim of this review is to boost the bioinformatics community to develop new approaches for searching and determining significant groups/clusters of features based on one or more biological grouping functions.

摘要

在过去二十年中，高通量技术取得了巨大进展，这导致了各种表型的基因表达数据集公共存储库呈指数级增长。通过比较不同条件下的基因表达水平，如疾病与对照、治疗与未治疗、药物A与药物B等，有可能揭示生物标志物。这个问题涉及机器学习领域中一个研究充分的问题，即特征选择问题。在生物数据分析中，大多数计算特征选择方法都借鉴了其他领域，而没有考虑生物数据的本质。因此，对于这类数据，在进行特征选择时利用生物知识的综合方法是必要的。综合基因选择过程背后的主要思想是，在考虑应用于基因表达数据的统计指标以及作为外部数据集提供的生物背景信息的同时，生成一个基因排名列表。本综述的主要目标之一是探索现有的整合不同类型信息的方法，以改进疾病生物分子特征的识别和新潜在治疗靶点的发现。这些综合方法有望有助于疾病的预测、诊断和治疗，以及让我们了解疾病状态动态、发病机制和进展情况。各种类型生物信息的整合将需要开发新的整合和数据分析技术。本综述的另一个目的是推动生物信息学界开发新方法，以基于一种或多种生物分组功能搜索和确定重要的特征组/簇。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a5f/7821996/346979953ab9/entropy-23-00002-g001.jpg

相似文献

Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data.基于生物领域知识的特征选择在基因表达数据中的应用。

Entropy (Basel). 2020 Dec 22;23(1):2. doi: 10.3390/e23010002.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning.基因本体论：通过利用基于生物知识的机器学习对基因表达数据进行分组、评分和建模来识别受影响的基因本体术语。

Front Genet. 2023 Aug 21;14:1139082. doi: 10.3389/fgene.2023.1139082. eCollection 2023.

CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis.CogNet：基于面向排名活性子网的KEGG通路富集分析的基因表达数据分类

PeerJ Comput Sci. 2021 Feb 22;7:e336. doi: 10.7717/peerj-cs.336. eCollection 2021.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Review of feature selection approaches based on grouping of features.基于特征分组的特征选择方法综述。

PeerJ. 2023 Jul 17;11:e15666. doi: 10.7717/peerj.15666. eCollection 2023.

PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach.PriPath：通过分组、评分和建模，并结合嵌入式特征选择方法，从差异基因表达中识别失调途径。

BMC Bioinformatics. 2023 Feb 23;24(1):60. doi: 10.1186/s12859-023-05187-2.

Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.临床神经科学中的功能基因组学和蛋白质组学：数据挖掘与生物信息学

Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5.

Machine Learning Approaches for Biomarker Discovery Using Gene Expression Data使用基因表达数据发现生物标志物的机器学习方法

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象：化学与物理邂逅生物学（瑞士阿斯科纳，2012年6月10日至14日）

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

引用本文的文献

Machine learning reveals distinct gene expression signatures across tissue states in stony coral tissue loss disease.机器学习揭示了石珊瑚组织损失病不同组织状态下独特的基因表达特征。

R Soc Open Sci. 2025 Jul 23;12(7):241993. doi: 10.1098/rsos.241993. eCollection 2025 Jul.

Machine learning classifier solving the problem of sleep stage imbalance between overnight sleep.解决夜间睡眠阶段失衡问题的机器学习分类器。

Biomed Eng Lett. 2025 Mar 4;15(3):513-523. doi: 10.1007/s13534-025-00466-8. eCollection 2025 May.

RCE-IFE: recursive cluster elimination with intra-cluster feature elimination.RCE-IFE：带簇内特征消除的递归簇消除

PeerJ Comput Sci. 2025 Feb 7;11:e2528. doi: 10.7717/peerj-cs.2528. eCollection 2025.

Topic selection for text classification using ensemble topic modeling with grouping, scoring, and modeling approach.使用具有分组、评分和建模方法的集成主题建模进行文本分类的主题选择

Sci Rep. 2024 Oct 9;14(1):23516. doi: 10.1038/s41598-024-74022-2.

microBiomeGSM: the identification of taxonomic biomarkers from metagenomic data using grouping, scoring and modeling (G-S-M) approach.微生物群落GSM：使用分组、评分和建模（G-S-M）方法从宏基因组数据中识别分类学生物标志物。

Front Microbiol. 2023 Nov 22;14:1264941. doi: 10.3389/fmicb.2023.1264941. eCollection 2023.

Front Genet. 2023 Aug 21;14:1139082. doi: 10.3389/fgene.2023.1139082. eCollection 2023.

A Supervised Learning Regression Method for the Analysis of the Taste Functions of Healthy Controls and Patients with Chemosensory Loss.一种用于分析健康对照者和化学感觉丧失患者味觉功能的监督学习回归方法。

Biomedicines. 2023 Jul 28;11(8):2133. doi: 10.3390/biomedicines11082133.

Review of feature selection approaches based on grouping of features.基于特征分组的特征选择方法综述。

PeerJ. 2023 Jul 17;11:e15666. doi: 10.7717/peerj.15666. eCollection 2023.

BMC Bioinformatics. 2023 Feb 23;24(1):60. doi: 10.1186/s12859-023-05187-2.

miRdisNET: Discovering microRNA biomarkers that are associated with diseases utilizing biological knowledge-based machine learning.miRdisNET：利用基于生物学知识的机器学习发现与疾病相关的微小RNA生物标志物。

Front Genet. 2023 Jan 12;13:1076554. doi: 10.3389/fgene.2022.1076554. eCollection 2022.

本文引用的文献

PeerJ Comput Sci. 2021 Feb 22;7:e336. doi: 10.7717/peerj-cs.336. eCollection 2021.

Recursive Cluster Elimination based Rank Function (SVM-RCE-R) implemented in KNIME.基于递归聚类消除的秩函数（SVM-RCE-R）在 KNIME 中的实现。

F1000Res. 2020 Oct 19;9:1255. doi: 10.12688/f1000research.26880.2. eCollection 2020.

Mutations in normal tissues-some diagnostic and clinical implications.正常组织中的突变——一些诊断和临床意义。

BMC Med. 2020 Oct 29;18(1):283. doi: 10.1186/s12916-020-01763-y.

sigFeature: Novel Significant Feature Selection Method for Classification of Gene Expression Data Using Support Vector Machine and Statistic.sigFeature：一种使用支持向量机和统计方法对基因表达数据进行分类的新型显著特征选择方法

Front Genet. 2020 Apr 3;11:247. doi: 10.3389/fgene.2020.00247. eCollection 2020.

pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks.pathfindR：一个通过活性子网全面识别组学数据中富集通路的R包。

Front Genet. 2019 Sep 25;10:858. doi: 10.3389/fgene.2019.00858. eCollection 2019.

Machine learning and complex biological data.机器学习和复杂的生物数据。

Genome Biol. 2019 Apr 16;20(1):76. doi: 10.1186/s13059-019-1689-0.

maTE: discovering expressed interactions between microRNAs and their targets.maTE：发现 microRNAs 与其靶标之间的表达相互作用。

Bioinformatics. 2019 Oct 15;35(20):4020-4028. doi: 10.1093/bioinformatics/btz204.

Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches.基因表达数据的整合基因选择：为传统方法提供生物学背景。

J Integr Bioinform. 2018 Dec 22;16(1):20180064. doi: 10.1515/jib-2018-0064.

Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities.用于整合生物学和医学数据的机器学习：原理、实践与机遇

Inf Fusion. 2019 Oct;50:71-91. doi: 10.1016/j.inffus.2018.09.012. Epub 2018 Sep 21.

Integrated Theory- and Data-driven Feature Selection in Gene Expression Data Analysis.基因表达数据分析中基于理论与数据驱动的综合特征选择

Proc Int Conf Data Eng. 2017 Apr;2017:1525-1532. doi: 10.1109/ICDE.2017.223. Epub 2017 May 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于生物领域知识的特征选择在基因表达数据中的应用。

Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献