利用符号回归识别组学数据中的相互作用，以发现临床生物标志物。

Identifying interactions in omics data for clinical biomarker discovery using symbolic regression.

机构信息

Department of Chemistry, University of Copenhagen, Copenhagen 1871, Denmark.

Abzu ApS, Copenhagen 2150, Denmark.

出版信息

Bioinformatics. 2022 Aug 2;38(15):3749-3758. doi: 10.1093/bioinformatics/btac405.

DOI:10.1093/bioinformatics/btac405

PMID:35731214

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9344843/

Abstract

MOTIVATION

The identification of predictive biomarker signatures from omics and multi-omics data for clinical applications is an active area of research. Recent developments in assay technologies and machine learning (ML) methods have led to significant improvements in predictive performance. However, most high-performing ML methods suffer from complex architectures and lack interpretability.

RESULTS

We present the application of a novel symbolic-regression-based algorithm, the QLattice, on a selection of clinical omics datasets. This approach generates parsimonious high-performing models that can both predict disease outcomes and reveal putative disease mechanisms, demonstrating the importance of selecting maximally relevant and minimally redundant features in omics-based machine-learning applications. The simplicity and high-predictive power of these biomarker signatures make them attractive tools for high-stakes applications in areas such as primary care, clinical decision-making and patient stratification.

AVAILABILITY AND IMPLEMENTATION

The QLattice is available as part of a python package (feyn), which is available at the Python Package Index (https://pypi.org/project/feyn/) and can be installed via pip. The documentation provides guides, tutorials and the API reference (https://docs.abzu.ai/). All code and data used to generate the models and plots discussed in this work can be found in https://github.com/abzu-ai/QLattice-clinical-omics.

SUPPLEMENTARY INFORMATION

Supplementary material is available at Bioinformatics online.

摘要

动机

从组学和多组学数据中识别用于临床应用的预测生物标志物特征是一个活跃的研究领域。分析技术和机器学习（ML）方法的最新进展导致预测性能有了显著提高。然而，大多数高性能的 ML 方法都存在复杂的架构和缺乏可解释性的问题。

结果

我们在一系列临床组学数据集上应用了一种新的基于符号回归的算法 QLattice。这种方法生成了简洁的高性能模型，既能预测疾病结果，又能揭示潜在的疾病机制，这表明在基于组学的机器学习应用中选择最大相关和最小冗余特征的重要性。这些生物标志物特征的简单性和高预测能力使它们成为初级保健、临床决策和患者分层等领域高风险应用的有吸引力的工具。

可用性和实现

QLattice 作为 Python 包（feyn）的一部分提供，该包可在 Python 包索引（https://pypi.org/project/feyn/）中获得，并可通过 pip 安装。文档提供了指南、教程和 API 参考（https://docs.abzu.ai/）。本工作中讨论的模型和图生成所使用的所有代码和数据都可以在 https://github.com/abzu-ai/QLattice-clinical-omics 中找到。

补充信息

补充材料可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/558a/9344843/b7508bbbf6bf/btac405f1.jpg

相似文献

Identifying interactions in omics data for clinical biomarker discovery using symbolic regression.

Bioinformatics. 2022 Aug 2;38(15):3749-3758. doi: 10.1093/bioinformatics/btac405.

Analysing high-throughput sequencing data in Python with HTSeq 2.0.

Bioinformatics. 2022 May 13;38(10):2943-2945. doi: 10.1093/bioinformatics/btac166.

GSEApy: a comprehensive package for performing gene set enrichment analysis in Python.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac757.

dRFEtools: dynamic recursive feature elimination for omics.

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad513.

Feature set optimization in biomarker discovery from genome-scale data.

Bioinformatics. 2020 Jun 1;36(11):3393-3400. doi: 10.1093/bioinformatics/btaa144.

HOMELETTE: a unified interface to homology modelling software.

Bioinformatics. 2022 Mar 4;38(6):1749-1751. doi: 10.1093/bioinformatics/btab866.

Scbean: a python library for single-cell multi-omics data analysis.

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae053.

PyGenePlexus: a Python package for gene discovery using network-based machine learning.

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad064.

SGI: automatic clinical subgroup identification in omics datasets.

Bioinformatics. 2022 Jan 3;38(2):573-576. doi: 10.1093/bioinformatics/btab656.

PyIOmica: longitudinal omics analysis and trend identification.

Bioinformatics. 2020 Apr 1;36(7):2306-2307. doi: 10.1093/bioinformatics/btz896.

引用本文的文献

Integrating lipidomics and machine learning to characterize lipid profile differences among goose breeds.

Poult Sci. 2025 May 1;104(11):105239. doi: 10.1016/j.psj.2025.105239.

Representation Meets Optimization: Training PINNs and PIKANs for Gray-Box Discovery in Systems Pharmacology.

ArXiv. 2025 Apr 10:arXiv:2504.07379v1.

Omics in Keratoconus: From Molecular to Clinical Practice.

J Clin Med. 2025 Apr 3;14(7):2459. doi: 10.3390/jcm14072459.

Transcriptomics analysis reveals molecular alterations underpinning spaceflight dermatology.

Commun Med (Lond). 2024 Jun 11;4(1):106. doi: 10.1038/s43856-024-00532-9.

Network dynamics and therapeutic aspects of mRNA and protein markers with the recurrence sites of pancreatic cancer.

Heliyon. 2024 May 17;10(10):e31437. doi: 10.1016/j.heliyon.2024.e31437. eCollection 2024 May 30.

Multi-System-Level Analysis with RNA-Seq on Pterygium Inflammation Discovers Association between Inflammatory Responses, Oxidative Stress, and Oxidative Phosphorylation.

Int J Mol Sci. 2024 Apr 27;25(9):4789. doi: 10.3390/ijms25094789.

ITree: a user-driven tool for interactive decision-making with classification trees.

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae273.

Multi-System-Level Analysis Reveals Differential Expression of Stress Response-Associated Genes in Inflammatory Solar Lentigo.

Int J Mol Sci. 2024 Apr 3;25(7):3973. doi: 10.3390/ijms25073973.

From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies.

Mol Biotechnol. 2025 Apr;67(4):1269-1289. doi: 10.1007/s12033-024-01133-6. Epub 2024 Apr 2.

AI-Aristotle: A physics-informed framework for systems biology gray-box identification.

PLoS Comput Biol. 2024 Mar 12;20(3):e1011916. doi: 10.1371/journal.pcbi.1011916. eCollection 2024 Mar.

本文引用的文献

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.

Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.

Considerations regarding a diagnosis of Alzheimer's disease before dementia: a systematic review.

Alzheimers Res Ther. 2022 Feb 10;14(1):31. doi: 10.1186/s13195-022-00971-3.

Navigating the pitfalls of applying machine learning in genomics.

Nat Rev Genet. 2022 Mar;23(3):169-181. doi: 10.1038/s41576-021-00434-9. Epub 2021 Nov 26.

Artificial intelligence for proteomics and biomarker discovery.

Cell Syst. 2021 Aug 18;12(8):759-770. doi: 10.1016/j.cels.2021.06.006.

DOME: recommendations for supervised machine learning validation in biology.

Nat Methods. 2021 Oct;18(10):1122-1127. doi: 10.1038/s41592-021-01205-4.

Human White Adipose Tissue Displays Selective Insulin Resistance in the Obese State.

Diabetes. 2021 Jul;70(7):1486-1497. doi: 10.2337/db21-0001. Epub 2021 Apr 16.

Hepatocellular carcinoma.

Nat Rev Dis Primers. 2021 Jan 21;7(1):6. doi: 10.1038/s41572-020-00240-3.

Multiomic Integration of Public Oncology Databases in Bioconductor.

JCO Clin Cancer Inform. 2020 Oct;4:958-971. doi: 10.1200/CCI.19.00119.

Proteome profiling in cerebrospinal fluid reveals novel biomarkers of Alzheimer's disease.

Mol Syst Biol. 2020 Jun;16(6):e9356. doi: 10.15252/msb.20199356.

SciPy 1.0: fundamental algorithms for scientific computing in Python.

Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用符号回归识别组学数据中的相互作用，以发现临床生物标志物。

Identifying interactions in omics data for clinical biomarker discovery using symbolic regression.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献