组学生物标志物鉴定在转化医学中的应用

-Omics biomarker identification pipeline for translational medicine.

机构信息

College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK.

Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, B15 2TT, UK.

出版信息

J Transl Med. 2019 May 14;17(1):155. doi: 10.1186/s12967-019-1912-5.

DOI:10.1186/s12967-019-1912-5

PMID:31088492

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6518609/

Abstract

BACKGROUND

Translational medicine (TM) is an emerging domain that aims to facilitate medical or biological advances efficiently from the scientist to the clinician. Central to the TM vision is to narrow the gap between basic science and applied science in terms of time, cost and early diagnosis of the disease state. Biomarker identification is one of the main challenges within TM. The identification of disease biomarkers from -omics data will not only help the stratification of diverse patient cohorts but will also provide early diagnostic information which could improve patient management and potentially prevent adverse outcomes. However, biomarker identification needs to be robust and reproducible. Hence a robust unbiased computational framework that can help clinicians identify those biomarkers is necessary.

METHODS

We developed a pipeline (workflow) that includes two different supervised classification techniques based on regularization methods to identify biomarkers from -omics or other high dimension clinical datasets. The pipeline includes several important steps such as quality control and stability of selected biomarkers. The process takes input files (outcome and independent variables or -omics data) and pre-processes (normalization, missing values) them. After a random division of samples into training and test sets, Least Absolute Shrinkage and Selection Operator and Elastic Net feature selection methods are applied to identify the most important features representing potential biomarker candidates. The penalization parameters are optimised using 10-fold cross validation and the process undergoes 100 iterations and a combinatorial analysis to select the best performing multivariate model. An empirical unbiased assessment of their quality as biomarkers for clinical use is performed through a Receiver Operating Characteristic curve and its Area Under the Curve analysis on both permuted and real data for 1000 different randomized training and test sets. We validated this pipeline against previously published biomarkers.

RESULTS

We applied this pipeline to three different datasets with previously published biomarkers: lipidomics data by Acharjee et al. (Metabolomics 13:25, 2017) and transcriptomics data by Rajamani and Bhasin (Genome Med 8:38, 2016) and Mills et al. (Blood 114:1063-1072, 2009). Our results demonstrate that our method was able to identify both previously published biomarkers as well as new variables that add value to the published results.

CONCLUSIONS

We developed a robust pipeline to identify clinically relevant biomarkers that can be applied to different -omics datasets. Such identification reveals potentially novel drug targets and can be used as a part of a machine-learning based patient stratification framework in the translational medicine settings.

摘要

背景

转化医学（TM）是一个新兴的领域，旨在有效地将医学或生物学的进展从科学家传递给临床医生。TM 的核心愿景是缩小基础科学和应用科学之间在时间、成本和疾病状态早期诊断方面的差距。生物标志物的鉴定是 TM 中的主要挑战之一。从组学数据中鉴定疾病生物标志物不仅有助于对不同患者群体进行分层，还可以提供早期诊断信息，从而改善患者管理并有可能预防不良后果。然而，生物标志物的鉴定需要稳健且可重复。因此，需要一个稳健的、无偏倚的计算框架来帮助临床医生识别这些生物标志物。

方法

我们开发了一个包含两种不同基于正则化方法的监督分类技术的流水线（工作流程），用于从组学或其他高维临床数据集识别生物标志物。该流水线包括几个重要步骤，例如所选生物标志物的质量控制和稳定性。该过程接受输入文件（结果和独立变量或组学数据）并对其进行预处理（标准化、缺失值）。在将样本随机分为训练集和测试集之后，应用最小绝对收缩和选择算子（Least Absolute Shrinkage and Selection Operator，LASSO）和弹性网络（Elastic Net）特征选择方法来识别最能代表潜在生物标志物候选物的重要特征。使用 10 折交叉验证优化惩罚参数，并通过 100 次迭代和组合分析来选择表现最佳的多元模型。通过在 1000 个不同的随机训练和测试集上对置换数据和真实数据进行接收器操作特征（Receiver Operating Characteristic，ROC）曲线及其曲线下面积（Area Under the Curve，AUC）分析，对其作为临床使用的生物标志物的质量进行经验性无偏评估。我们针对先前发表的生物标志物验证了该流水线。

结果

我们将该流水线应用于具有先前发表的生物标志物的三个不同数据集：Acharjee 等人的脂质组学数据（Metabolomics 13:25, 2017）和 Rajamani 和 Bhasin 的转录组学数据（Genome Med 8:38, 2016）以及 Mills 等人的血液学数据（Blood 114:1063-1072, 2009）。我们的结果表明，我们的方法能够识别先前发表的生物标志物以及为发表结果增加价值的新变量。

结论

我们开发了一种稳健的流水线来识别具有临床相关性的生物标志物，可应用于不同的组学数据集。这种鉴定揭示了潜在的新药物靶点，并可作为转化医学环境中基于机器学习的患者分层框架的一部分使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6df0/6518609/4df9a862d480/12967_2019_1912_Fig1_HTML.jpg

相似文献

-Omics biomarker identification pipeline for translational medicine.

J Transl Med. 2019 May 14;17(1):155. doi: 10.1186/s12967-019-1912-5.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

A random forest based biomarker discovery and power analysis framework for diagnostics research.

BMC Med Genomics. 2020 Nov 23;13(1):178. doi: 10.1186/s12920-020-00826-6.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Supervised Parametric Learning in the Identification of Composite Biomarker Signatures of Type 1 Diabetes in Integrated Parallel Multi-Omics Datasets.

Biomedicines. 2024 Feb 22;12(3):492. doi: 10.3390/biomedicines12030492.

VSOLassoBag: a variable-selection oriented LASSO bagging algorithm for biomarker discovery in omic-based translational research.

J Genet Genomics. 2023 Mar;50(3):151-162. doi: 10.1016/j.jgg.2022.12.005. Epub 2023 Jan 3.

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Biomarker discovery studies for patient stratification using machine learning analysis of omics data: a scoping review.

BMJ Open. 2021 Dec 6;11(12):e053674. doi: 10.1136/bmjopen-2021-053674.

A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research.

Int J Mol Sci. 2021 Mar 10;22(6):2822. doi: 10.3390/ijms22062822.

Feature set optimization in biomarker discovery from genome-scale data.

Bioinformatics. 2020 Jun 1;36(11):3393-3400. doi: 10.1093/bioinformatics/btaa144.

引用本文的文献

Plasma proteomics in pediatric patients with sepsis- hopes and challenges.

Clin Proteomics. 2025 Mar 18;22(1):10. doi: 10.1186/s12014-025-09533-9.

Methods for multi-omic data integration in cancer research.

Front Genet. 2024 Sep 19;15:1425456. doi: 10.3389/fgene.2024.1425456. eCollection 2024.

Explainable AI-prioritized plasma and fecal metabolites in inflammatory bowel disease and their dietary associations.

iScience. 2024 Jun 17;27(7):110298. doi: 10.1016/j.isci.2024.110298. eCollection 2024 Jul 19.

Automated cell type annotation and exploration of single-cell signaling dynamics using mass cytometry.

iScience. 2024 Jun 12;27(7):110261. doi: 10.1016/j.isci.2024.110261. eCollection 2024 Jul 19.

Machine Learning Models for the Identification of Prognostic and Predictive Cancer Biomarkers: A Systematic Review.

Int J Mol Sci. 2023 Apr 24;24(9):7781. doi: 10.3390/ijms24097781.

Integration of stool microbiota, proteome and amino acid profiles to discriminate patients with adenomas and colorectal cancer.

Gut Microbes. 2022 Jan-Dec;14(1):2139979. doi: 10.1080/19490976.2022.2139979.

A New Strategy for Identification of Coal Miners With Abnormal Physical Signs Based on EN-mRMR.

Front Bioeng Biotechnol. 2022 Jul 11;10:935481. doi: 10.3389/fbioe.2022.935481. eCollection 2022.

The potential of fecal microbiota and amino acids to detect and monitor patients with adenoma.

Gut Microbes. 2022 Jan-Dec;14(1):2038863. doi: 10.1080/19490976.2022.2038863.

Human disease biomarker panels through systems biology.

Biophys Rev. 2021 Oct 13;13(6):1179-1190. doi: 10.1007/s12551-021-00849-y. eCollection 2021 Dec.

Multi-Omics Profiling Approach to Asthma: An Evolving Paradigm.

J Pers Med. 2022 Jan 7;12(1):66. doi: 10.3390/jpm12010066.

本文引用的文献

Challenges in the Integration of Omics and Non-Omics Data.

Genes (Basel). 2019 Mar 20;10(3):238. doi: 10.3390/genes10030238.

Single-cell sequencing in ovarian cancer: a new frontier in precision medicine.

Curr Opin Obstet Gynecol. 2019 Feb;31(1):49-55. doi: 10.1097/GCO.0000000000000516.

A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification.

Sci Rep. 2018 Nov 7;8(1):16477. doi: 10.1038/s41598-018-34833-6.

Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets.

Mol Syst Biol. 2018 Jun 20;14(6):e8124. doi: 10.15252/msb.20178124.

Integration of multi-omics data and deep phenotyping enables prediction of cytokine responses.

Nat Immunol. 2018 Jul;19(7):776-786. doi: 10.1038/s41590-018-0121-3. Epub 2018 May 21.

Single-Cell Transcriptomic Analysis of Tumor Heterogeneity.

Trends Cancer. 2018 Apr;4(4):264-268. doi: 10.1016/j.trecan.2018.02.003. Epub 2018 Mar 9.

Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours.

Nature. 2018 Mar 15;555(7696):371-376. doi: 10.1038/nature25795. Epub 2018 Feb 28.

Integrating Clinical and Multiple Omics Data for Prognostic Assessment across Human Cancers.

Sci Rep. 2017 Dec 5;7(1):16954. doi: 10.1038/s41598-017-17031-8.

Single-cell analyses to tailor treatments.

Sci Transl Med. 2017 Sep 20;9(408). doi: 10.1126/scitranslmed.aan4730.

Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis.

Nat Biotechnol. 2017 May 9;35(5):409-412. doi: 10.1038/nbt.3825.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

组学生物标志物鉴定在转化医学中的应用

-Omics biomarker identification pipeline for translational medicine.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献