利用符号回归识别组学数据中的相互作用,以发现临床生物标志物。
Identifying interactions in omics data for clinical biomarker discovery using symbolic regression.
机构信息
Department of Chemistry, University of Copenhagen, Copenhagen 1871, Denmark.
Abzu ApS, Copenhagen 2150, Denmark.
出版信息
Bioinformatics. 2022 Aug 2;38(15):3749-3758. doi: 10.1093/bioinformatics/btac405.
MOTIVATION
The identification of predictive biomarker signatures from omics and multi-omics data for clinical applications is an active area of research. Recent developments in assay technologies and machine learning (ML) methods have led to significant improvements in predictive performance. However, most high-performing ML methods suffer from complex architectures and lack interpretability.
RESULTS
We present the application of a novel symbolic-regression-based algorithm, the QLattice, on a selection of clinical omics datasets. This approach generates parsimonious high-performing models that can both predict disease outcomes and reveal putative disease mechanisms, demonstrating the importance of selecting maximally relevant and minimally redundant features in omics-based machine-learning applications. The simplicity and high-predictive power of these biomarker signatures make them attractive tools for high-stakes applications in areas such as primary care, clinical decision-making and patient stratification.
AVAILABILITY AND IMPLEMENTATION
The QLattice is available as part of a python package (feyn), which is available at the Python Package Index (https://pypi.org/project/feyn/) and can be installed via pip. The documentation provides guides, tutorials and the API reference (https://docs.abzu.ai/). All code and data used to generate the models and plots discussed in this work can be found in https://github.com/abzu-ai/QLattice-clinical-omics.
SUPPLEMENTARY INFORMATION
Supplementary material is available at Bioinformatics online.
动机
从组学和多组学数据中识别用于临床应用的预测生物标志物特征是一个活跃的研究领域。分析技术和机器学习(ML)方法的最新进展导致预测性能有了显著提高。然而,大多数高性能的 ML 方法都存在复杂的架构和缺乏可解释性的问题。
结果
我们在一系列临床组学数据集上应用了一种新的基于符号回归的算法 QLattice。这种方法生成了简洁的高性能模型,既能预测疾病结果,又能揭示潜在的疾病机制,这表明在基于组学的机器学习应用中选择最大相关和最小冗余特征的重要性。这些生物标志物特征的简单性和高预测能力使它们成为初级保健、临床决策和患者分层等领域高风险应用的有吸引力的工具。
可用性和实现
QLattice 作为 Python 包(feyn)的一部分提供,该包可在 Python 包索引(https://pypi.org/project/feyn/)中获得,并可通过 pip 安装。文档提供了指南、教程和 API 参考(https://docs.abzu.ai/)。本工作中讨论的模型和图生成所使用的所有代码和数据都可以在 https://github.com/abzu-ai/QLattice-clinical-omics 中找到。
补充信息
补充材料可在生物信息学在线获得。