Rouse Rodney, Min Min, Francke Sabine, Mog Steven, Zhang Jun, Shea Katherine, Stewart Sharron, Colatsky Thomas
Division of Applied Regulatory Science, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA
Division of Biometrics VI, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA.
Toxicol Pathol. 2015 Jul;43(5):662-74. doi: 10.1177/0192623314562072. Epub 2014 Dec 17.
Attempts to characterize and formally qualify biomarkers for regulatory purposes have raised questions about how histological and histopathological methods impact the evaluation of biomarker performance. A group of pathologists was asked to analyze digitized images prepared from rodent kidney injury experiments in studies designed to investigate sources of variability in histopathology evaluations. Study A maximized variability by using samples from diverse studies and providing minimal guidance, contextual information, or opportunities for pathologist interaction. Study B was designed to limit interpathologist variability by using more uniform image sets from different locations within the same kidneys and allowing pathologist selected interactions to discuss and identify the location and injury to be evaluated but without providing a lexicon or peer review. Results from this study suggest that differences between pathologists and across models of disease are the largest sources of variability in evaluations and that blind evaluations do not generally make a significant difference. Results of this study generally align with recommendations from both industry and the U.S. Food and Drug Administration and should inform future studies examining the effects of common lexicons and scoring criteria, peer review, and blind evaluations in the context of biomarker performance assessment.
为了监管目的而对生物标志物进行特征描述和正式鉴定的尝试,引发了关于组织学和组织病理学方法如何影响生物标志物性能评估的问题。在旨在研究组织病理学评估中变异性来源的研究中,一组病理学家被要求分析从啮齿动物肾脏损伤实验中制备的数字化图像。研究A通过使用来自不同研究的样本并提供最少的指导、背景信息或病理学家互动机会,使变异性最大化。研究B旨在通过使用来自同一肾脏不同位置的更统一图像集,并允许病理学家选择互动以讨论和确定要评估的位置和损伤,但不提供词汇表或同行评审,来限制病理学家之间的变异性。这项研究的结果表明,病理学家之间以及不同疾病模型之间的差异是评估中变异性的最大来源,并且盲法评估通常不会产生显著差异。这项研究的结果总体上与行业和美国食品药品监督管理局的建议一致,并且应该为未来研究在生物标志物性能评估背景下检查通用词汇表和评分标准、同行评审以及盲法评估的效果提供参考。