Suppr超能文献

癌症研究中基于质谱蛋白质组学数据的生物标志物鉴定的可重复性。

Reproducibility of biomarker identifications from mass spectrometry proteomic data in cancer studies.

作者信息

Liang Yulan, Kelemen Adam, Kelemen Arpad

机构信息

Department of Family and Community Health, University of Maryland, Baltimore, MD 21201-1579, USA.

Department of Computer Science, University of Maryland, College Park, MD 20742, USA.

出版信息

Stat Appl Genet Mol Biol. 2019 May 11;18(3):sagmb-2018-0039. doi: 10.1515/sagmb-2018-0039.

Abstract

Reproducibility of disease signatures and clinical biomarkers in multi-omics disease analysis has been a key challenge due to a multitude of factors. The heterogeneity of the limited sample, various biological factors such as environmental confounders, and the inherent experimental and technical noises, compounded with the inadequacy of statistical tools, can lead to the misinterpretation of results, and subsequently very different biology. In this paper, we investigate the biomarker reproducibility issues, potentially caused by differences of statistical methods with varied distribution assumptions or marker selection criteria using Mass Spectrometry proteomic ovarian tumor data. We examine the relationship between effect sizes, p values, Cauchy p values, False Discovery Rate p values, and the rank fractions of identified proteins out of thousands in the limited heterogeneous sample. We compared the markers identified from statistical single features selection approaches with machine learning wrapper methods. The results reveal marked differences when selecting the protein markers from varied methods with potential selection biases and false discoveries, which may be due to the small effects, different distribution assumptions, and p value type criteria versus prediction accuracies. The alternative solutions and other related issues are discussed in supporting the reproducibility of findings for clinical actionable outcomes.

摘要

由于多种因素,疾病特征和临床生物标志物在多组学疾病分析中的可重复性一直是一个关键挑战。有限样本的异质性、各种生物因素(如环境混杂因素)以及固有的实验和技术噪声,再加上统计工具的不足,可能导致结果的错误解读,进而得出截然不同的生物学结论。在本文中,我们利用质谱蛋白质组学卵巢肿瘤数据,研究了生物标志物可重复性问题,这些问题可能是由具有不同分布假设或标记选择标准的统计方法差异所导致的。我们研究了效应大小、p值、柯西p值、错误发现率p值与在有限的异质样本中数千种已鉴定蛋白质的排名分数之间的关系。我们将从统计单特征选择方法中鉴定出的标记与机器学习包装方法进行了比较。结果表明,从具有潜在选择偏差和错误发现的不同方法中选择蛋白质标记时存在显著差异,这可能是由于效应较小、分布假设不同以及p值类型标准与预测准确性之间的差异所致。文中还讨论了支持临床可操作结果的研究结果可重复性的替代解决方案和其他相关问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验