病例对照生物标志物研究中对照误分类的处理。

Accounting for control mislabeling in case-control biomarker studies.

机构信息

Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, United Kingdom.

出版信息

J Proteome Res. 2011 Dec 2;10(12):5562-7. doi: 10.1021/pr200507b. Epub 2011 Nov 8.

DOI:10.1021/pr200507b

PMID:22010953

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3314325/

Abstract

In biomarker discovery studies, uncertainty associated with case and control labels is often overlooked. By omitting to take into account label uncertainty, model parameters and the predictive risk can become biased, sometimes severely. The most common situation is when the control set contains an unknown number of undiagnosed, or future, cases. This has a marked impact in situations where the model needs to be well-calibrated, e.g., when the prediction performance of a biomarker panel is evaluated. Failing to account for class label uncertainty may lead to underestimation of classification performance and bias in parameter estimates. This can further impact on meta-analysis for combining evidence from multiple studies. Using a simulation study, we outline how conventional statistical models can be modified to address class label uncertainty leading to well-calibrated prediction performance estimates and reduced bias in meta-analysis. We focus on the problem of mislabeled control subjects in case-control studies, i.e., when some of the control subjects are undiagnosed cases, although the procedures we report are generic. The uncertainty in control status is a particular situation common in biomarker discovery studies in the context of genomic and molecular epidemiology, where control subjects are commonly sampled from the general population with an established expected disease incidence rate.

摘要

在生物标志物发现研究中，与病例和对照标签相关的不确定性通常被忽视。如果不考虑标签不确定性，模型参数和预测风险可能会产生偏差，有时甚至会严重偏差。最常见的情况是，对照组包含未知数量的未确诊或未来的病例。在模型需要良好校准的情况下，例如评估生物标志物组合的预测性能时，这种情况会产生明显的影响。未能考虑类别标签不确定性可能导致分类性能的低估和参数估计的偏差。这会进一步影响来自多个研究的证据的荟萃分析。我们使用模拟研究概述了如何修改传统统计模型来解决类别标签不确定性问题，从而获得良好校准的预测性能估计值，并减少荟萃分析中的偏差。我们重点讨论病例对照研究中对照受试者标记错误的问题，即当一些对照受试者是未确诊的病例时，尽管我们报告的程序是通用的。在基因组学和分子流行病学的背景下，控制状态的不确定性是生物标志物发现研究中常见的特殊情况，其中对照受试者通常从具有既定预期疾病发病率的一般人群中抽取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ba7/3314325/5094a28e9d43/ukmss-39147-f0002.jpg

相似文献

Accounting for control mislabeling in case-control biomarker studies.病例对照生物标志物研究中对照误分类的处理。

J Proteome Res. 2011 Dec 2;10(12):5562-7. doi: 10.1021/pr200507b. Epub 2011 Nov 8.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Microsimulation Estimates of Decision Uncertainty and Value of Information Are Biased but Consistent.决策不确定性和信息价值的微观模拟估计存在偏差但具有一致性。

Med Decis Making. 2025 Feb;45(2):127-142. doi: 10.1177/0272989X241305414. Epub 2024 Dec 25.

Combining biomarkers for classification with covariate adjustment.结合生物标志物进行分类并进行协变量调整。

Stat Med. 2017 Jul 10;36(15):2347-2362. doi: 10.1002/sim.7274. Epub 2017 Mar 9.

Mammographic density, endocrine therapy and breast cancer risk: a prognostic and predictive biomarker review.乳腺密度、内分泌治疗与乳腺癌风险：预后和预测生物标志物综述。

Cochrane Database Syst Rev. 2021 Oct 26;10(10):CD013091. doi: 10.1002/14651858.CD013091.pub2.

Conditional estimation after a two-stage diagnostic biomarker study that allows early termination for futility.两阶段诊断性生物标志物研究后的条件估计，该研究允许因无效而提前终止。

Stat Med. 2012 Feb 28;31(5):420-35. doi: 10.1002/sim.4430. Epub 2012 Jan 12.

Biases introduced by choosing controls to match risk factors of cases in biomarker research.生物标志物研究中通过选择对照以匹配病例风险因素而引入的偏倚。

Clin Chem. 2012 Aug;58(8):1242-51. doi: 10.1373/clinchem.2012.186007. Epub 2012 Jun 22.

Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation.通过结合内部验证和多重填补来评估不完整数据中的预测性能。

BMC Med Res Methodol. 2016 Oct 26;16(1):144. doi: 10.1186/s12874-016-0239-7.

Forcing dichotomous disease classification from reference standards leads to bias in diagnostic accuracy estimates: A simulation study.强制将疾病分类为二分类会导致诊断准确性估计的偏差：一项模拟研究。

J Clin Epidemiol. 2019 Jul;111:1-10. doi: 10.1016/j.jclinepi.2019.03.002. Epub 2019 Mar 20.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

本文引用的文献

Biobanking in a fast moving world: an international perspective.瞬息万变的世界中的生物样本库：国际视角

J Natl Cancer Inst Monogr. 2011;2011(42):50-1. doi: 10.1093/jncimonographs/lgr005.

Analysis by categorizing or dichotomizing continuous variables is inadvisable: an example from the natural history of unruptured aneurysms.将连续变量分类或二分法分析是不可取的：未破裂动脉瘤自然史的一个例子。

AJNR Am J Neuroradiol. 2011 Mar;32(3):437-40. doi: 10.3174/ajnr.A2425. Epub 2011 Feb 17.

What is a biomarker? Research investments and lack of clinical integration necessitate a review of biomarker terminology and validation schema.什么是生物标志物？研究投入与临床整合的欠缺使得有必要对生物标志物术语及验证方案进行审视。

Scand J Clin Lab Invest Suppl. 2010;242:6-14. doi: 10.3109/00365513.2010.493354.

Simultaneous inference of biological networks of multiple species from genome-wide data and evolutionary information: a semi-supervised approach.从全基因组数据和进化信息推断多个物种的生物网络：一种半监督方法。

Bioinformatics. 2009 Nov 15;25(22):2962-8. doi: 10.1093/bioinformatics/btp494. Epub 2009 Aug 17.

Biobanking for better healthcare.生物样本库助力更优质医疗。

Mol Oncol. 2008 Oct;2(3):213-22. doi: 10.1016/j.molonc.2008.07.004. Epub 2008 Jul 30.

Effects of misdiagnosis in input data on the identification of differential expression genes in incipient Alzheimer patients.初始阿尔茨海默病患者输入数据中的误诊对差异表达基因识别的影响。

In Silico Biol. 2008;8(5-6):545-54.

RNA-Seq: a revolutionary tool for transcriptomics.RNA测序：转录组学的革命性工具。

Nat Rev Genet. 2009 Jan;10(1):57-63. doi: 10.1038/nrg2484.

Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning.使用半监督学习在有注释和无注释的微阵列数据集中发现生物标志物。

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-9-S2-S7.

Robust and efficient identification of biomarkers by classifying features on graphs.通过对图上的特征进行分类实现稳健且高效的生物标志物识别。

Bioinformatics. 2008 Sep 15;24(18):2023-9. doi: 10.1093/bioinformatics/btn383. Epub 2008 Jul 24.

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.对14000例七种常见疾病患者及3000例共享对照进行全基因组关联研究。

Nature. 2007 Jun 7;447(7145):661-78. doi: 10.1038/nature05911.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验