Beaumont Hubert, Iannessi Antoine
Sciences, Median Technologies, Valbonne, France.
Front Oncol. 2023 Oct 4;13:1239570. doi: 10.3389/fonc.2023.1239570. eCollection 2023.
In lung clinical trials with imaging, blinded independent central review with double reads is recommended to reduce evaluation bias and the Response Evaluation Criteria In Solid Tumor (RECIST) is still widely used. We retrospectively analyzed the inter-reader discrepancies rate over time, the risk factors for discrepancies related to baseline evaluations, and the potential of machine learning to predict inter-reader discrepancies.
We retrospectively analyzed five BICR clinical trials for patients on immunotherapy or targeted therapy for lung cancer. Double reads of 1724 patients involving 17 radiologists were performed using RECIST 1.1. We evaluated the rate of discrepancies over time according to four endpoints: progressive disease declared (PDD), date of progressive disease (DOPD), best overall response (BOR), and date of the first response (DOFR). Risk factors associated with discrepancies were analyzed, two predictive models were evaluated.
At the end of trials, the discrepancy rates between trials were not different. On average, the discrepancy rates were 21.0%, 41.0%, 28.8%, and 48.8% for PDD, DOPD, BOR, and DOFR, respectively. Over time, the discrepancy rate was higher for DOFR than DOPD, and the rates increased as the trial progressed, even after accrual was completed. It was rare for readers to not find any disease, for less than 7% of patients, at least one reader selected non-measurable disease only (NTL). Often the readers selected some of their target lesions (TLs) and NTLs in different organs, with ranges of 36.0-57.9% and 60.5-73.5% of patients, respectively. Rarely (4-8.1%) two readers selected all their TLs in different locations. Significant risk factors were different depending on the endpoint and the trial being considered. Prediction had a poor performance but the positive predictive value was higher than 80%. The best classification was obtained with BOR.
Predicting discordance rates necessitates having knowledge of patient accrual, patient survival, and the probability of discordances over time. In lung cancer trials, although risk factors for inter-reader discrepancies are known, they are weakly significant, the ability to predict discrepancies from baseline data is limited. To boost prediction accuracy, it would be necessary to enhance baseline-derived features or create new ones, considering other risk factors and looking into optimal reader associations.
在肺部影像学临床试验中,建议采用双读的盲法独立中央审查以减少评估偏倚,实体瘤疗效评价标准(RECIST)仍被广泛使用。我们回顾性分析了不同时间点阅片者之间的差异率、与基线评估相关的差异风险因素以及机器学习预测阅片者间差异的潜力。
我们回顾性分析了五项针对肺癌患者免疫治疗或靶向治疗的盲法独立中央审查临床试验。使用RECIST 1.1对1724例患者进行了双读,涉及17名放射科医生。我们根据四个终点评估了不同时间点的差异率:宣布疾病进展(PDD)、疾病进展日期(DOPD)、最佳总体缓解(BOR)以及首次缓解日期(DOFR)。分析了与差异相关的风险因素,评估了两种预测模型。
在试验结束时,各试验之间的差异率没有差异。平均而言,PDD、DOPD、BOR和DOFR的差异率分别为21.0%、41.0%、28.8%和48.8%。随着时间推移,DOFR的差异率高于DOPD,并且差异率随着试验进展而增加,即使在入组完成后也是如此。阅片者极少找不到任何疾病,对于不到7%的患者,至少有一名阅片者仅选择不可测量疾病(NTL)。阅片者通常会在不同器官中选择一些其目标病灶(TL)和NTL,分别占患者的36.0 - 57.9%和60.5 - 73.5%。极少情况下(4 - 8.1%),两名阅片者会在不同位置选择所有的TL。根据所考虑的终点和试验不同,显著风险因素也不同。预测性能较差,但阳性预测值高于80%。BOR获得了最佳分类。
预测不一致率需要了解患者入组情况、患者生存率以及随时间变化的不一致概率。在肺癌试验中,尽管已知阅片者间差异的风险因素,但它们的显著性较弱,从基线数据预测差异的能力有限。为提高预测准确性,有必要增强基于基线的特征或创建新的特征,考虑其他风险因素并研究最佳阅片者组合。