Ribera Ashley, Sugahara Otoe, Buchannan Tatiana, Vazquez Norma, Lyle Alicia N, Zhang Li, Danilenko Uliana I, Vesper Hubert W
Division of Laboratory Sciences, Centers for Disease Control and Prevention (CDC), Atlanta, Georgia, USA.
Thyroid. 2025 May;35(5):471-484. doi: 10.1089/thy.2024.0728. Epub 2025 May 7.
Performance of thyroid function assays can vary significantly. To address this issue, the Centers for Disease Control and Prevention (CDC) Clinical Standardization Programs conducted an interlaboratory comparison of free thyroxine (fT4) immunoassays (IAs) and laboratory-developed tests (LDTs). This assessment aimed to determine the current performance characteristics of these assays as a first step toward measurement standardization. Thyrotropin (TSH) IAs were also evaluated. Assays measured 41 blinded individual-donor sera, including a sample from a pregnant woman (for fT4 analysis only) and three serum pools, with 11.3-32.1 pmol/L (0.881-2.49 ng/dL) fT4 and 0.337-21.6 mIU/L TSH in duplicate over 2 days. Passing-Bablok regression analysis performed pre-recalibration compared assays performance to the CDC fT4 reference measurement procedure (RMP) or TSH all-lab mean (ALM). Additionally, the impact of linear regression-based recalibration of assays to the CDC fT4 RMP or TSH ALM was estimated. Inter-assay agreement of sample classification according to the assay-specific reference interval (RI) was assessed pre- and post-recalibration. A total of 21 fT4 and 17 TSH assays participated. Pre-recalibration, median biases of TSH measurements to the ALM were -1.2% [confidence interval or CI -1.8% to -0.4%], and good classification agreement among TSH assays was observed. fT4 assays all showed a negative median bias to the RMP, with higher bias among IAs (median: -20.3%, CI [-21.5% to -19.4%]) than LDTs (median: -4.5%, [CI -6.1% to -3.2%]). Of the individual-donor sera, only 21 out of 40 samples were classified uniformly by all fT4 assays, indicating poor inter-assay agreement. Post-recalibration, agreement improved to 33 out of 40 individual-donor sera correctly classified by all tested IAs and LDTs. Similar improvement in post-recalibration median percent bias was observed for fT4 IAs (median: -0.2, [CI -1.2% to 0.6%]) and LDTs (median: -0.3%, [CI -2.5% to 1.4%]). The comparison among fT4 assays emphasizes the need for measurement standardization to improve accuracy and comparability. This and previous studies demonstrate the possibility to develop common fT4 RIs via standardization, enabling the use of evidence-based clinical guidelines universally in patient care. Recalibration can effectively address high variability in fT4 assays, ensuring consistent diagnostic classification.
甲状腺功能检测的表现可能有显著差异。为解决这一问题,美国疾病控制与预防中心(CDC)临床标准化项目开展了一项游离甲状腺素(fT4)免疫测定法(IAs)和实验室自行开发检测方法(LDTs)的实验室间比较。该评估旨在确定这些检测方法当前的性能特征,作为迈向测量标准化的第一步。促甲状腺激素(TSH)免疫测定法也进行了评估。检测方法对41份盲法个体供体血清进行了检测,包括一份孕妇样本(仅用于fT4分析)和三个血清池,fT4浓度为11.3 - 32.1 pmol/L(0.881 - 2.49 ng/dL),TSH浓度为0.337 - 21.6 mIU/L,在两天内重复检测两次。校准前进行的Passing - Bablok回归分析将检测方法的性能与CDC的fT4参考测量程序(RMP)或TSH全实验室均值(ALM)进行了比较。此外,还估计了基于线性回归将检测方法校准至CDC的fT4 RMP或TSH ALM的影响。在校准前后,根据检测方法特定的参考区间(RI)对样本分类的检测间一致性进行了评估。共有21种fT4检测方法和17种TSH检测方法参与。校准前,TSH测量值相对于ALM的中位数偏差为 - 1.2% [置信区间或CI为 - 1.8%至 - 0.4%],并且观察到TSH检测方法之间有良好的分类一致性。fT4检测方法对RMP均显示出负中位数偏差,免疫测定法中的偏差更高(中位数: - 20.3%,CI [- 21.5%至 - 19.4%]),高于实验室自行开发检测方法(中位数: - 4.5%,[CI - 6.1%至 - 3.2%])。在个体供体血清中,40个样本中只有21个被所有fT4检测方法统一分类,表明检测间一致性较差。校准后,一致性提高到40个个体供体血清中有33个被所有测试的免疫测定法和实验室自行开发检测方法正确分类。fT4免疫测定法在校准后的中位数偏差百分比也有类似改善(中位数: - 0.2,[CI - 1.2%至0.6%]),实验室自行开发检测方法(中位数: - 0.3%,[CI - 2.5%至1.4%])。fT4检测方法之间的比较强调了测量标准化对于提高准确性和可比性的必要性。这项研究以及之前的研究表明,通过标准化制定通用的fT4参考区间是可能的,从而能够在患者护理中普遍使用基于证据的临床指南。校准可以有效解决fT4检测方法中的高变异性,确保一致的诊断分类。