通过提取互补信息将多种分子来源整合到临床风险预测特征中。

Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information.

作者信息

Hieke Stefanie, Benner Axel, Schlenl Richard F, Schumacher Martin, Bullinger Lars, Binder Harald

机构信息

Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Stefan-Meier-Str. 26, Freiburg, 79104, Germany.

Freiburg Center for Data Analysis and Modeling, University Freiburg, Eckerstr. 1, Freiburg, 79104, Germany.

出版信息

BMC Bioinformatics. 2016 Aug 30;17(1):327. doi: 10.1186/s12859-016-1183-6.

DOI:10.1186/s12859-016-1183-6

PMID:27578050

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5004308/

Abstract

BACKGROUND

High-throughput technology allows for genome-wide measurements at different molecular levels for the same patient, e.g. single nucleotide polymorphisms (SNPs) and gene expression. Correspondingly, it might be beneficial to also integrate complementary information from different molecular levels when building multivariable risk prediction models for a clinical endpoint, such as treatment response or survival. Unfortunately, such a high-dimensional modeling task will often be complicated by a limited overlap of molecular measurements at different levels between patients, i.e. measurements from all molecular levels are available only for a smaller proportion of patients.

RESULTS

We propose a sequential strategy for building clinical risk prediction models that integrate genome-wide measurements from two molecular levels in a complementary way. To deal with partial overlap, we develop an imputation approach that allows us to use all available data. This approach is investigated in two acute myeloid leukemia applications combining gene expression with either SNP or DNA methylation data. After obtaining a sparse risk prediction signature e.g. from SNP data, an automatically selected set of prognostic SNPs, by componentwise likelihood-based boosting, imputation is performed for the corresponding linear predictor by a linking model that incorporates e.g. gene expression measurements. The imputed linear predictor is then used for adjustment when building a prognostic signature from the gene expression data. For evaluation, we consider stability, as quantified by inclusion frequencies across resampling data sets. Despite an extremely small overlap in the application example with gene expression and SNPs, several genes are seen to be more stably identified when taking the (imputed) linear predictor from the SNP data into account. In the application with gene expression and DNA methylation, prediction performance with respect to survival also indicates that the proposed approach might work well.

CONCLUSIONS

We consider imputation of linear predictor values to be a feasible and sensible approach for dealing with partial overlap in complementary integrative analysis of molecular measurements at different levels. More generally, these results indicate that a complementary strategy for integrating different molecular levels can result in more stable risk prediction signatures, potentially providing a more reliable insight into the underlying biology.

摘要

背景

高通量技术能够在同一患者的不同分子水平上进行全基因组测量，例如单核苷酸多态性（SNP）和基因表达。相应地，在构建针对临床终点（如治疗反应或生存）的多变量风险预测模型时，整合来自不同分子水平的互补信息可能会有所助益。不幸的是，这样一个高维建模任务常常会因患者之间不同水平分子测量的重叠有限而变得复杂，即只有较小比例的患者可获得所有分子水平的测量数据。

结果

我们提出了一种构建临床风险预测模型的序贯策略，该策略以互补方式整合来自两个分子水平的全基因组测量数据。为处理部分重叠问题，我们开发了一种插补方法，使我们能够利用所有可用数据。此方法在两个急性髓系白血病应用中进行了研究，这两个应用将基因表达与SNP或DNA甲基化数据相结合。在通过基于分量似然的boosting获得例如来自SNP数据的稀疏风险预测特征（一组自动选择的预后SNP）后，通过一个包含例如基因表达测量值的链接模型对相应的线性预测器进行插补。然后，在根据基因表达数据构建预后特征时，将插补后的线性预测器用于调整。为进行评估，我们考虑稳定性，通过重采样数据集的包含频率来量化。尽管在基因表达和SNP的应用示例中重叠极小，但考虑来自SNP数据的（插补后）线性预测器时，有几个基因被发现能更稳定地被识别。在基因表达和DNA甲基化的应用中，关于生存的预测性能也表明所提出的方法可能效果良好。

结论

我们认为对线性预测器值进行插补是处理不同水平分子测量的互补综合分析中部分重叠问题的一种可行且合理的方法。更一般地说，这些结果表明整合不同分子水平的互补策略可导致更稳定的风险预测特征，有可能为潜在生物学提供更可靠的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e948/5004308/745ca30a4025/12859_2016_1183_Fig1_HTML.jpg

相似文献

Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information.

BMC Bioinformatics. 2016 Aug 30;17(1):327. doi: 10.1186/s12859-016-1183-6.

Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures.

Stat Med. 2013 May 10;32(10):1778-91. doi: 10.1002/sim.5490. Epub 2012 Jul 11.

A multivariable approach for risk markers from pooled molecular data with only partial overlap.

BMC Med Genet. 2019 Jul 19;20(1):128. doi: 10.1186/s12881-019-0849-0.

Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling.

PLoS One. 2016 May 9;11(5):e0155226. doi: 10.1371/journal.pone.0155226. eCollection 2016.

A weighting approach for judging the effect of patient strata on high-dimensional risk prediction signatures.

BMC Bioinformatics. 2015 Sep 15;16:294. doi: 10.1186/s12859-015-0716-8.

A boosting approach for adapting the sparsity of risk prediction signatures based on different molecular levels.

Stat Appl Genet Mol Biol. 2014 Jun;13(3):343-57. doi: 10.1515/sagmb-2013-0050.

Seven-CpG-based prognostic signature coupled with gene expression predicts survival of oral squamous cell carcinoma.

Clin Epigenetics. 2017 Aug 24;9:88. doi: 10.1186/s13148-017-0392-9. eCollection 2017.

Transforming RNA-Seq data to improve the performance of prognostic gene signatures.

PLoS One. 2014 Jan 8;9(1):e85150. doi: 10.1371/journal.pone.0085150. eCollection 2014.

An integrated approach of gene expression and DNA-methylation profiles of WNT signaling genes uncovers novel prognostic markers in acute myeloid leukemia.

BMC Bioinformatics. 2015;16 Suppl 4(Suppl 4):S4. doi: 10.1186/1471-2105-16-S4-S4. Epub 2015 Feb 23.

Cluster-localized sparse logistic regression for SNP data.

Stat Appl Genet Mol Biol. 2012 Aug 14;11(4):/j/sagmb.2012.11.issue-4/1544-6115.1694/1544-6115.1694.xml. doi: 10.1515/1544-6115.1694.

引用本文的文献

Cerebrospinal Fluid Metabolomics and Proteomics Integration in Neurological Syndromes.

Methods Mol Biol. 2025;2914:303-321. doi: 10.1007/978-1-0716-4462-1_21.

Protein Kinase C Epsilon Overexpression Is Associated With Poor Patient Outcomes in AML and Promotes Daunorubicin Resistance Through p-Glycoprotein-Mediated Drug Efflux.

Front Oncol. 2022 May 30;12:840046. doi: 10.3389/fonc.2022.840046. eCollection 2022.

A multivariable approach for risk markers from pooled molecular data with only partial overlap.

BMC Med Genet. 2019 Jul 19;20(1):128. doi: 10.1186/s12881-019-0849-0.

A strategy for high-dimensional multivariable analysis classifies childhood asthma phenotypes from genetic, immunological, and environmental factors.

Allergy. 2019 Jul;74(7):1364-1373. doi: 10.1111/all.13745. Epub 2019 Mar 31.

A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA.

BMC Med Genomics. 2019 Jan 31;12(Suppl 1):14. doi: 10.1186/s12920-018-0451-x.

本文引用的文献

On stability issues in deriving multivariable regression models.

Biom J. 2015 Jul;57(4):531-55. doi: 10.1002/bimj.201300222. Epub 2014 Dec 15.

Transforming RNA-Seq data to improve the performance of prognostic gene signatures.

PLoS One. 2014 Jan 8;9(1):e85150. doi: 10.1371/journal.pone.0085150. eCollection 2014.

Bioinformatics. 2012 Dec 15;28(24):3290-7. doi: 10.1093/bioinformatics/bts595. Epub 2012 Oct 9.

The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Nature. 2012 Mar 28;483(7391):603-7. doi: 10.1038/nature11003.

A flexible framework for sparse simultaneous component based data integration.

BMC Bioinformatics. 2011 Nov 15;12:448. doi: 10.1186/1471-2105-12-448.

Stability investigations of multivariable regression models derived from low- and high-dimensional data.

J Biopharm Stat. 2011 Nov;21(6):1206-31. doi: 10.1080/10543406.2011.629890.

Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis.

BMC Bioinformatics. 2010 Nov 30;11:587. doi: 10.1186/1471-2105-11-587.

Differential expression analysis for sequence count data.

Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.

Musashi-2 regulates normal hematopoiesis and promotes aggressive myeloid leukemia.

Nat Med. 2010 Aug;16(8):903-8. doi: 10.1038/nm.2187. Epub 2010 Jul 8.

Testing SNPs and sets of SNPs for importance in association studies.

Biostatistics. 2011 Jan;12(1):18-32. doi: 10.1093/biostatistics/kxq042. Epub 2010 Jul 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过提取互补信息将多种分子来源整合到临床风险预测特征中。

Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献