评估主观测试所需观察者数量的确定及其在两项 PD-L1 研究中的应用。

Determination of the number of observers needed to evaluate a subjective test and its application in two PD-L1 studies.

机构信息

Department of Epidemiology and Biostatistics, Texas A&M University School of Public Health, College Station, Texas, USA.

Department of Biostatistics and Bioinformatics, Moffitt Cancer Center & Research Institute, Tampa, Florida, USA.

出版信息

Stat Med. 2022 Apr 15;41(8):1361-1375. doi: 10.1002/sim.9282. Epub 2021 Dec 12.

DOI:10.1002/sim.9282

PMID:34897773

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10243718/

Abstract

In pathological studies, subjective assays, especially companion diagnostic tests, can dramatically affect treatment of cancer. Binary diagnostic test results (ie, positive vs negative) may vary between pathologists or observers who read the tumor slides. Some tests have clearly defined criteria resulting in highly concordant outcomes, even with minimal training. Other tests are more challenging. Observers may achieve poor concordance even with training. While there are many statistically rigorous methods for measuring concordance between observers, we are unaware of a method that can identify how many observers are needed to determine whether a test can reach an acceptable concordance, if at all. Here we introduce a statistical approach to the assessment of test performance when the test is read by multiple observers, as would occur in the real world. By plotting the number of observers against the estimated overall agreement proportion, we can obtain a curve that plateaus to the average observer concordance. Diagnostic tests that are well-defined and easily judged show high concordance and plateau with few interobserver comparisons. More challenging tests do not plateau until many interobserver comparisons are made, and typically reach a lower plateau or even 0. We further propose a statistical test of whether the overall agreement proportion will drop to 0 with a large number of pathologists. The proposed analytical framework can be used to evaluate the difficulty in the interpretation of pathological test criteria and platforms, and to determine how pathology-based subjective tests will perform in the real world. The method could also be used outside of pathology, where concordance of a diagnosis or decision point relies on the subjective application of multiple criteria. We apply this method in two recent PD-L1 studies to test whether the curve of overall agreement proportion will converge to 0 and determine the minimal sufficient number of observers required to estimate the concordance plateau of their reads.

摘要

在病理研究中，主观检测，尤其是伴随诊断检测，可能会极大地影响癌症的治疗。对肿瘤切片进行阅读的病理学家或观察者之间的二元诊断检测结果（即阳性与阴性）可能存在差异。有些检测具有明确的定义标准，即使经过最少的培训，结果也高度一致。其他检测则更具挑战性。即使经过培训，观察者的一致性也可能较差。虽然有许多统计学上严格的方法可以衡量观察者之间的一致性，但我们不知道是否有一种方法可以确定需要多少观察者才能确定测试是否可以达到可接受的一致性，如果可以的话。在这里，我们引入了一种当测试由多个观察者进行阅读时评估测试性能的统计方法，这种情况在现实世界中会经常发生。通过将观察者的数量与估计的总体一致性比例作图，可以得到一条曲线，该曲线在平均观察者一致性处趋于平稳。定义明确且易于判断的诊断测试显示出高度的一致性，并且只需进行少数几次观察者比较即可达到平稳状态。更具挑战性的测试则需要进行多次观察者比较才能达到平稳状态，并且通常只能达到较低的平稳状态，甚至达到 0。我们进一步提出了一种统计检验，用于检验随着病理学家数量的增加，总体一致性比例是否会降至 0。所提出的分析框架可用于评估病理测试标准和平台解释的难度，并确定基于病理学的主观测试在现实世界中的表现。该方法也可用于其他领域，在这些领域中，诊断或决策点的一致性依赖于多个标准的主观应用。我们在两项最近的 PD-L1 研究中应用了这种方法，以检验总体一致性比例的曲线是否会收敛到 0，并确定估计其读数一致性平稳所需的最小观察者数量。

相似文献

Determination of the number of observers needed to evaluate a subjective test and its application in two PD-L1 studies.

Stat Med. 2022 Apr 15;41(8):1361-1375. doi: 10.1002/sim.9282. Epub 2021 Dec 12.

Prospective multi-institutional evaluation of pathologist assessment of PD-L1 assays for patient selection in triple negative breast cancer.

Mod Pathol. 2020 Sep;33(9):1746-1752. doi: 10.1038/s41379-020-0544-x. Epub 2020 Apr 16.

High interobserver and intraobserver reproducibility among pathologists assessing PD-L1 CPS across multiple indications.

Histopathology. 2022 Dec;81(6):732-741. doi: 10.1111/his.14775. Epub 2022 Sep 23.

A Prospective, Multi-institutional, Pathologist-Based Assessment of 4 Immunohistochemistry Assays for PD-L1 Expression in Non-Small Cell Lung Cancer.

JAMA Oncol. 2017 Aug 1;3(8):1051-1058. doi: 10.1001/jamaoncol.2017.0013.

PD-L1 immunohistochemistry in non-small-cell lung cancer: unraveling differences in staining concordance and interpretation.

Virchows Arch. 2021 May;478(5):827-839. doi: 10.1007/s00428-020-02976-5. Epub 2020 Dec 4.

Harmonized PD-L1 immunohistochemistry for pulmonary squamous-cell and adenocarcinomas.

Mod Pathol. 2016 Oct;29(10):1165-72. doi: 10.1038/modpathol.2016.117. Epub 2016 Jul 8.

Comparability of PD-L1 immunohistochemistry assays for non-small-cell lung cancer: a systematic review.

Histopathology. 2020 May;76(6):793-802. doi: 10.1111/his.14040. Epub 2020 Mar 24.

Interobserver agreement in programmed cell death-ligand 1 immunohistochemistry scoring in nonsmall cell lung carcinoma cytologic specimens.

Diagn Cytopathol. 2021 Feb;49(2):219-225. doi: 10.1002/dc.24651. Epub 2020 Oct 26.

Comparison of continuous measures across diagnostic PD-L1 assays in non-small cell lung cancer using automated image analysis.

Mod Pathol. 2020 Mar;33(3):380-390. doi: 10.1038/s41379-019-0349-y. Epub 2019 Sep 16.

Evaluation of an online training tool for scoring programmed cell death ligand-1 (PD-L1) diagnostic tests for lung cancer.

Diagn Pathol. 2020 Apr 17;15(1):37. doi: 10.1186/s13000-020-00953-9.

引用本文的文献

Artificial intelligence-assisted HER2 interpretation for breast cancers in a multi-laboratory study.

Gland Surg. 2025 Jun 30;14(6):1042-1051. doi: 10.21037/gs-2024-560. Epub 2025 Jun 26.

Analytical Validation of Esopredict, an Epigenetic Prognostic Assay for Patients with Barrett's Esophagus.

Diagnostics (Basel). 2024 Sep 10;14(18):2003. doi: 10.3390/diagnostics14182003.

Weakly-supervised deep learning models enable HER2-low prediction from H &E stained slides.

Breast Cancer Res. 2024 Aug 19;26(1):124. doi: 10.1186/s13058-024-01863-0.

Measuring disability among U.S. adolescents and young adults: A survey measurement experiment.

Prev Med Rep. 2024 May 23;43:102770. doi: 10.1016/j.pmedr.2024.102770. eCollection 2024 Jul.

Multi-Institutional Study of Pathologist Reading of the Programmed Cell Death Ligand-1 Combined Positive Score Immunohistochemistry Assay for Gastric or Gastroesophageal Junction Cancer.

Mod Pathol. 2023 May;36(5):100128. doi: 10.1016/j.modpat.2023.100128. Epub 2023 Feb 13.

Multi-institutional Assessment of Pathologist Scoring HER2 Immunohistochemistry.

Mod Pathol. 2023 Jan;36(1):100032. doi: 10.1016/j.modpat.2022.100032.

The Evolution of Ki-67 and Breast Carcinoma: Past Observations, Present Directions, and Future Considerations.

Cancers (Basel). 2023 Jan 28;15(3):808. doi: 10.3390/cancers15030808.

Development of an immunohistochemical assay for Siglec-15.

Lab Invest. 2022 Jul;102(7):771-778. doi: 10.1038/s41374-022-00785-9. Epub 2022 Apr 22.

本文引用的文献

Do YOU know the Ki-67 index of your breast cancer patients? Knowledge of your institution's Ki-67 index distribution and its robustness is essential for decision-making in early breast cancer.

Breast. 2020 Jun;51:120-126. doi: 10.1016/j.breast.2020.03.005. Epub 2020 Mar 23.

Prospective multi-institutional evaluation of pathologist assessment of PD-L1 assays for patient selection in triple negative breast cancer.

Mod Pathol. 2020 Sep;33(9):1746-1752. doi: 10.1038/s41379-020-0544-x. Epub 2020 Apr 16.

Analytical validation of a standardised scoring protocol for Ki67 immunohistochemistry on breast cancer excision whole sections: an international multicentre collaboration.

Histopathology. 2019 Aug;75(2):225-235. doi: 10.1111/his.13880. Epub 2019 Jul 8.

Atezolizumab and Nab-Paclitaxel in Advanced Triple-Negative Breast Cancer.

N Engl J Med. 2018 Nov 29;379(22):2108-2121. doi: 10.1056/NEJMoa1809615. Epub 2018 Oct 20.

PD-L1 Immunohistochemistry Comparability Study in Real-Life Clinical Samples: Results of Blueprint Phase 2 Project.

J Thorac Oncol. 2018 Sep;13(9):1302-1311. doi: 10.1016/j.jtho.2018.05.013. Epub 2018 May 22.

A Prospective, Multi-institutional, Pathologist-Based Assessment of 4 Immunohistochemistry Assays for PD-L1 Expression in Non-Small Cell Lung Cancer.

JAMA Oncol. 2017 Aug 1;3(8):1051-1058. doi: 10.1001/jamaoncol.2017.0013.

Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.

Psychol Bull. 1968 Oct;70(4):213-20. doi: 10.1037/h0026256.

Novel image analysis approach for quantifying expression of nuclear proteins assessed by immunohistochemistry: application to measurement of oestrogen and progesterone receptor levels in breast cancer.

Breast Cancer Res. 2008;10(5):R89. doi: 10.1186/bcr2187. Epub 2008 Oct 23.

A unified approach for assessing agreement for continuous and categorical data.

J Biopharm Stat. 2007;17(4):629-52. doi: 10.1080/10543400701376498.

Interobserver agreement for estrogen receptor immunohistochemical analysis in breast cancer: a comparison of manual and computer-assisted scoring methods.

Ann Diagn Pathol. 2004 Feb;8(1):23-7. doi: 10.1016/j.anndiagpath.2003.11.004.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估主观测试所需观察者数量的确定及其在两项 PD-L1 研究中的应用。

Determination of the number of observers needed to evaluate a subjective test and its application in two PD-L1 studies.

机构信息

Department of Epidemiology and Biostatistics, Texas A&M University School of Public Health, College Station, Texas, USA.

Department of Biostatistics and Bioinformatics, Moffitt Cancer Center & Research Institute, Tampa, Florida, USA.

出版信息

Stat Med. 2022 Apr 15;41(8):1361-1375. doi: 10.1002/sim.9282. Epub 2021 Dec 12.

DOI:10.1002/sim.9282

PMID:34897773

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10243718/

Abstract

摘要

评估主观测试所需观察者数量的确定及其在两项 PD-L1 研究中的应用。

Determination of the number of observers needed to evaluate a subjective test and its application in two PD-L1 studies.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

评估主观测试所需观察者数量的确定及其在两项 PD-L1 研究中的应用。

Determination of the number of observers needed to evaluate a subjective test and its application in two PD-L1 studies.

机构信息

出版信息