Suppr超能文献

拓展对临床级分子特征开发中偏倚的理解:急性呼吸道病毒感染的案例研究。

Expanding the understanding of biases in development of clinical-grade molecular signatures: a case study in acute respiratory viral infections.

机构信息

Center for Health Informatics and Bioinformatics, New York University School of Medicine, New York, New York, United States of America.

出版信息

PLoS One. 2011;6(6):e20662. doi: 10.1371/journal.pone.0020662. Epub 2011 Jun 1.

Abstract

BACKGROUND

The promise of modern personalized medicine is to use molecular and clinical information to better diagnose, manage, and treat disease, on an individual patient basis. These functions are predominantly enabled by molecular signatures, which are computational models for predicting phenotypes and other responses of interest from high-throughput assay data. Data-analytics is a central component of molecular signature development and can jeopardize the entire process if conducted incorrectly. While exploratory data analysis may tolerate suboptimal protocols, clinical-grade molecular signatures are subject to vastly stricter requirements. Closing the gap between standards for exploratory versus clinically successful molecular signatures entails a thorough understanding of possible biases in the data analysis phase and developing strategies to avoid them.

METHODOLOGY AND PRINCIPAL FINDINGS

Using a recently introduced data-analytic protocol as a case study, we provide an in-depth examination of the poorly studied biases of the data-analytic protocols related to signature multiplicity, biomarker redundancy, data preprocessing, and validation of signature reproducibility. The methodology and results presented in this work are aimed at expanding the understanding of these data-analytic biases that affect development of clinically robust molecular signatures.

CONCLUSIONS AND SIGNIFICANCE

Several recommendations follow from the current study. First, all molecular signatures of a phenotype should be extracted to the extent possible, in order to provide comprehensive and accurate grounds for understanding disease pathogenesis. Second, redundant genes should generally be removed from final signatures to facilitate reproducibility and decrease manufacturing costs. Third, data preprocessing procedures should be designed so as not to bias biomarker selection. Finally, molecular signatures developed and applied on different phenotypes and populations of patients should be treated with great caution.

摘要

背景

现代个性化医学的承诺是利用分子和临床信息,在个体患者的基础上更好地诊断、管理和治疗疾病。这些功能主要是通过分子特征来实现的,分子特征是一种从高通量检测数据中预测表型和其他感兴趣的反应的计算模型。数据分析是分子特征开发的核心组成部分,如果操作不当,可能会危及整个过程。虽然探索性数据分析可能容忍不理想的方案,但临床级别的分子特征则需要严格得多的要求。要弥合探索性与临床成功的分子特征之间的标准差距,就需要深入了解数据分析阶段可能存在的偏差,并制定避免这些偏差的策略。

方法和主要发现

使用最近引入的数据分析方案作为案例研究,我们深入研究了与特征多重性、生物标志物冗余、数据预处理和特征可重复性验证相关的数据分析方案中研究不足的偏差。本工作中提出的方法和结果旨在扩大对这些影响临床稳健分子特征开发的数据分析偏差的理解。

结论和意义

本研究提出了以下几点建议。首先,应尽可能提取所有表型的分子特征,以提供全面准确的疾病发病机制理解基础。其次,通常应从最终特征中去除冗余基因,以提高可重复性并降低制造成本。第三,数据预处理程序的设计应避免偏倚生物标志物选择。最后,应谨慎对待在不同表型和患者群体中开发和应用的分子特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/036a/3105991/ff976e8d8c5f/pone.0020662.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验