• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分析生物标志物发现:估计生物标志物集的可重复性。

Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets.

机构信息

Department of Computing Science, University of Alberta, Edmonton, Canada.

Department of Pure Math, University of Waterloo, Waterloo, ON, Canada.

出版信息

PLoS One. 2022 Jul 28;17(7):e0252697. doi: 10.1371/journal.pone.0252697. eCollection 2022.

DOI:10.1371/journal.pone.0252697
PMID:35901020
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9333302/
Abstract

Many researchers try to understand a biological condition by identifying biomarkers. This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with different outcomes. However, such sets of proposed biomarkers are often not reproducible - subsequent studies often fail to identify the same sets. Indeed, there is often only a very small overlap between the biomarkers proposed in pairs of related studies that explore the same phenotypes over the same distribution of subjects. This paper first defines the Reproducibility Score for a labeled dataset as a measure (taking values between 0 and 1) of the reproducibility of the results produced by a specified fixed biomarker discovery process for a given distribution of subjects. We then provide ways to reliably estimate this score by defining algorithms that produce an over-bound and an under-bound for this score for a given dataset and biomarker discovery process, for the case of univariate hypothesis testing on dichotomous groups. We confirm that these approximations are meaningful by providing empirical results on a large number of datasets and show that these predictions match known reproducibility results. To encourage others to apply this technique to analyze their biomarker sets, we have also created a publicly available website, https://biomarker.shinyapps.io/BiomarkerReprod/, that produces these Reproducibility Score approximations for any given dataset (with continuous or discrete features and binary class labels).

摘要

许多研究人员试图通过识别生物标志物来了解生物状况。这通常是通过在标记数据集上进行单变量假设检验来完成的,如果在具有不同结果的受试者中,特征值之间存在显著的统计学差异,则宣布该特征为生物标志物。然而,这样的一组提出的生物标志物通常是不可重现的 - 随后的研究往往无法识别相同的组。事实上,在探索相同表型分布的相关研究中,提出的生物标志物之间通常只有很小的重叠。本文首先将标记数据集的可重复性得分定义为(取值在 0 到 1 之间)用于指定固定生物标志物发现过程的结果的可重复性的度量值,用于给定的受试者分布。然后,我们通过定义算法来可靠地估计该得分,该算法为给定的数据集和生物标志物发现过程产生该得分的上限和下限,用于二分类组的单变量假设检验。我们通过提供大量数据集的经验结果来确认这些逼近是有意义的,并表明这些预测与已知的可重现性结果相匹配。为了鼓励其他人将这种技术应用于分析他们的生物标志物组,我们还创建了一个公共可用的网站,https://biomarker.shinyapps.io/BiomarkerReprod/,该网站可以为任何给定的数据集(具有连续或离散特征和二进制类标签)生成这些可重复性得分逼近值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/9ddaacc77bb1/pone.0252697.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/497bb42e8846/pone.0252697.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/0f1b10ddcf7c/pone.0252697.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/f27164c707b0/pone.0252697.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/3b2e5b684a0f/pone.0252697.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/9655e2b96a1c/pone.0252697.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/4951e97b00e1/pone.0252697.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/62be1c2c1779/pone.0252697.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/9ddaacc77bb1/pone.0252697.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/497bb42e8846/pone.0252697.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/0f1b10ddcf7c/pone.0252697.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/f27164c707b0/pone.0252697.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/3b2e5b684a0f/pone.0252697.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/9655e2b96a1c/pone.0252697.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/4951e97b00e1/pone.0252697.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/62be1c2c1779/pone.0252697.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2433/9333302/9ddaacc77bb1/pone.0252697.g008.jpg

相似文献

1
Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets.分析生物标志物发现:估计生物标志物集的可重复性。
PLoS One. 2022 Jul 28;17(7):e0252697. doi: 10.1371/journal.pone.0252697. eCollection 2022.
2
Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery.基于化学计量学的特征选择方法在早期癌症检测和生物标志物发现中的稳健性。
Stat Appl Genet Mol Biol. 2013 Mar 13;12(2):207-23. doi: 10.1515/sagmb-2012-0067.
3
Reliable Biomarker discovery from Metagenomic data via RegLRSD algorithm.通过RegLRSD算法从宏基因组数据中发现可靠的生物标志物。
BMC Bioinformatics. 2017 Jul 10;18(1):328. doi: 10.1186/s12859-017-1738-1.
4
Reproducible cancer biomarker discovery in SELDI-TOF MS using different pre-processing algorithms.基于 SELDI-TOF MS 的不同预处理算法的可重现性癌症生物标志物发现。
PLoS One. 2011;6(10):e26294. doi: 10.1371/journal.pone.0026294. Epub 2011 Oct 14.
5
A random forest based biomarker discovery and power analysis framework for diagnostics research.基于随机森林的生物标志物发现和诊断研究功效分析框架。
BMC Med Genomics. 2020 Nov 23;13(1):178. doi: 10.1186/s12920-020-00826-6.
6
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
7
Consistent metagenomic biomarker detection via robust PCA.通过稳健主成分分析进行一致的宏基因组生物标志物检测。
Biol Direct. 2017 Jan 31;12(1):4. doi: 10.1186/s13062-017-0175-4.
8
A simple method to combine multiple molecular biomarkers for dichotomous diagnostic classification.一种用于二分诊断分类的组合多种分子生物标志物的简单方法。
BMC Bioinformatics. 2006 Oct 10;7:442. doi: 10.1186/1471-2105-7-442.
9
Stable feature selection based on the ensemble L -norm support vector machine for biomarker discovery.基于集成L -范数支持向量机的稳定特征选择用于生物标志物发现。
BMC Genomics. 2016 Dec 22;17(Suppl 13):1026. doi: 10.1186/s12864-016-3320-z.
10
Improving the efficiency of biomarker identification using biological knowledge.利用生物学知识提高生物标志物识别效率。
Pac Symp Biocomput. 2009:427-38.

引用本文的文献

1
Deep-learning-enabled multi-omics analyses for prediction of future metastasis in cancer.基于深度学习的多组学分析用于预测癌症未来转移
bioRxiv. 2025 May 22:2025.05.16.654579. doi: 10.1101/2025.05.16.654579.
2
Towards early diagnosis of Alzheimer's disease: advances in immune-related blood biomarkers and computational approaches.迈向阿尔茨海默病的早期诊断:免疫相关血液生物标志物和计算方法的进展。
Front Immunol. 2024 Apr 23;15:1343900. doi: 10.3389/fimmu.2024.1343900. eCollection 2024.
3
Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD.

本文引用的文献

1
On the low reproducibility of cancer studies.论癌症研究的低可重复性。
Natl Sci Rev. 2018 Sep;5(5):619-624. doi: 10.1093/nsr/nwy021. Epub 2018 Feb 2.
2
RECAP reveals the true statistical significance of ChIP-seq peak calls.RECAP 揭示了 ChIP-seq 峰调用的真实统计意义。
Bioinformatics. 2019 Oct 1;35(19):3592-3598. doi: 10.1093/bioinformatics/btz150.
3
A New Genetic Risk Score to Predict the Outcome of Locally Advanced or Metastatic Breast Cancer Patients Treated With First-Line Exemestane: Results From a Prospective Study.
非公开数据分析的优势与局限:使用VisualSHIELD对乳腺癌生存分类器的比较
Front Genet. 2024 Jan 29;15:1270387. doi: 10.3389/fgene.2024.1270387. eCollection 2024.
4
Current State of Knowledge on Blood and Tissue-Based Biomarkers for Opisthorchis viverrini-induced Cholangiocarcinoma: A Review of Prognostic, Predictive, and Diagnostic Markers.华支睾吸虫所致胆管癌的血液和组织生物标志物的研究现状:预后、预测和诊断标志物的综述。
Asian Pac J Cancer Prev. 2024 Jan 1;25(1):25-41. doi: 10.31557/APJCP.2024.25.1.25.
一种新的遗传风险评分,用于预测一线依西美坦治疗的局部晚期或转移性乳腺癌患者的结局:来自一项前瞻性研究的结果。
Clin Breast Cancer. 2019 Apr;19(2):137-145.e4. doi: 10.1016/j.clbc.2018.11.009. Epub 2018 Nov 24.
4
Scale-Invariant Biomarker Discovery in Urine and Plasma Metabolite Fingerprints.尿液和血浆代谢物指纹中的尺度不变生物标志物发现。
J Proteome Res. 2017 Oct 6;16(10):3596-3605. doi: 10.1021/acs.jproteome.7b00325. Epub 2017 Sep 7.
5
Enlightening discriminative network functional modules behind Principal Component Analysis separation in differential-omic science studies.启发式判别网络功能模块在差异组学研究中主成分分析分离背后的作用。
Sci Rep. 2017 Mar 13;7:43946. doi: 10.1038/srep43946.
6
Consistent metagenomic biomarker detection via robust PCA.通过稳健主成分分析进行一致的宏基因组生物标志物检测。
Biol Direct. 2017 Jan 31;12(1):4. doi: 10.1186/s13062-017-0175-4.
7
Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example.从多站点静息态数据中提取可重现的生物标志物:基于自闭症的示例。
Neuroimage. 2017 Feb 15;147:736-745. doi: 10.1016/j.neuroimage.2016.10.045. Epub 2016 Nov 16.
8
Identifying Reproducible Molecular Biomarkers for Gastric Cancer Metastasis with the Aid of Recurrence Information.借助复发信息识别可重复的胃癌转移分子生物标志物
Sci Rep. 2016 Apr 25;6:24869. doi: 10.1038/srep24869.
9
Genetics of Phenylketonuria: Then and Now.苯丙酮尿症的遗传学:过去与现在
Hum Mutat. 2016 Jun;37(6):508-15. doi: 10.1002/humu.22980. Epub 2016 Mar 18.
10
Test set bias affects reproducibility of gene signatures.测试集偏差会影响基因特征的可重复性。
Bioinformatics. 2015 Jul 15;31(14):2318-23. doi: 10.1093/bioinformatics/btv157. Epub 2015 Mar 18.