Machtay M, Glatstein E
Department of Radiation Oncology, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania, 19104, USA.
Oncologist. 1998;3(3):III-IV.
On returning from a medical meeting, we learned that sadly a patient, "Mr. B.," had passed away. His death was a completely unexpected surprise. He had been doing well nine months after a course of intensive radiotherapy for a locally advanced head and neck cancer; in his most recent follow-up notes, he was described as a "complete remission." Nonetheless, he apparently died peacefully in his sleep from a cardiac arrest one night and was found the next day by a concerned neighbor. In our absence, after Mr. B. expired, his death certificate was filled out by a physician who didn't know him in detail, but did know why he recently was treated in our department. The cause of death was listed as head and neck cancer. It wasn't long after his death before we began to receive those notorious "requests for additional information," letters from the statistical office of a well-known cooperative group. Mr. B., as it turns out, was on a clinical trial, and it was "vital" to know further details of the circumstances of his passing. Perhaps this very large cancer had been controlled and Mr. B. succumbed to old age (helped along by the tobacco industry). On the other hand, maybe the residual "fibrosis" in his neck was actually packed with active tumor and his left carotid artery was finally 100% pinched off, or maybe he suffered a massive pulmonary embolism from cancer-related hypercoagulability. The forms and requests were completed with a succinct "cause of death uncertain," adding, "please have the Study Chairs call to discuss this difficult case." Often clinical reports of outcomes utilize and emphasize the endpoint "disease specific survival" (DSS). Like overall survival (OS), the DSS can be calculated by actuarial methods, with patients who have incomplete follow-up "censored" at the time of last follow-up pending further information. In the DSS, however, deaths unrelated to the index cancer of interest are censored at the time of death; thus, a death from intercurrent disease is considered a "success" (to the investigator, that is; obviously, not to the patient and his or her family). The DSS rate will always be superior to the OS rate. Obviously, for any OS curve, if one waits long enough it will ultimately come to zero. There is thus a very logical rationale for reporting the DSS separately, particularly in diseases where death from intercurrent disease is expected to be common. Analyzing the DSS allows researchers to better compare the biologic efficacy of two or more cancer treatments, since it does not necessarily come to zero. Unlike some other endpoints, including local-regional control or freedom from progression, it takes into account the possibility of salvage therapy. DSS also focuses on an endpoint of interest to the public-death from cancer. In a recent popular media survey in which people were asked how they would choose to die if they could, 0% selected cancer. However, there are two serious potential problems with heavy dependence on the DSS. First, since patients who die from intercurrent disease are considered "cured," it seriously inflates the apparent effectiveness of a cancer treatment. Given the same biologic disease and the same treatment, the DSS as calculated in an old, sick population at high risk of intercurrent death will be better than the DSS in a younger, healthier population whose major risk is from their cancer. This problem has been discussed with respect to early stage prostate cancer, in which the conservative approach of observation has been criticized. The studies at issue rely heavily on the DSS, suggesting a comparable DSS (90% at 10 years) with "watchful waiting" to other researchers' results with aggressive therapy. The problem is that these series of conservative management focus on a patient population (as opposed to individuals) with a high risk of competing causes of mortality, which is very different from the population of patients generally treated with aggressive therapy (in which some have shown overall survivals superior to age-matched controls). It is fallacious and illogical to compare nonrandomized series of observation to those of aggressive therapy. In addition to the above problem, the use of DSS introduces another potential issue which we will call the bias of cause-of-death-interpretation. All statistical endpoints (e.g., response rates, local-regional control, freedom from brain metastases), except OS, are known to depend heavily on the methods used to define the endpoint and are often subject to significant interobserver variability. There is no reason to believe that this problem does not occasionally occur with respect to defining a death as due to the index cancer or to intercurrent disease, even though this issue has been poorly studied. In many oncologic situations-for example, metastatic lung cancer-this form of bias does not exist. In some situations, such as head and neck cancer, this could be an intermediate problem (Was that lethal chest tumor a second primary or a metastasis?.Would the fatal aspiration pneumonia have occurred if he still had a tongue?.And what about Mr. B. described above?). In some situations, particularly relatively "good prognosis" neoplasms, this could be a substantial problem, particularly if the adjudication of whether or not a death is cancer-related is performed solely by researchers who have an "interest" in demonstrating a good DSS. What we are most concerned about with this form of bias relates to recent series on observation, such as in early prostate cancer. It is interesting to note that although only 10% of the "observed" patients die from prostate cancer, many develop distant metastases by 10 years (approximately 40% among patients with intermediate grade tumors). Thus, it is implied that many prostate cancer metastases are usually not of themselves lethal, which is a misconception to anyone experienced in taking care of prostate cancer patients. This is inconsistent with U.S. studies of metastatic prostate cancer in which the median survival is two to three years. It is possible that many deaths attributed to intercurrent disease in "watchful waiting" series were in fact prostate cancer-related, perhaps related to failure to thrive, urosepsis, or pulmonary emboli. We will not know without an independent review of the medical records of individual patients; in some cases, even the most detailed review, sometimes even an autopsy, will not be conclusive. There are only a few data available describing the problems created by cause-of-death-interpretation bias. One small study, presented only in abstract form, assessed the cause of death in 50 randomly selected prostate cancer patients who died. Five experts in prostate cancer were asked to assign the cause of death as due to or not due to prostate cancer. The DSS varied from 21% to 35% among the five reviewers, a relative difference of 66%. Studies of autopsies, which are now rarely done in the U.S., have shown that fatal malignant tumors were occasionally missed by clinicians and-even more sobering-an occasional patient thought to have died from metastatic cancer is found to have no tumor but to have died from a "benign" cause such as TB. One study suggested an error rate of approximately 8%. Clearly the use of DSS is here to stay and is a useful adjunct to OS in analyzing randomized trials. There needs to be more research on the validity and interobserver reproducibility of the DSS. In the meantime, researchers should not report DSS without reporting OS and the reasons for intercurrent deaths should be described-peer reviewers should enforce this. As with so many other problems with statistics in the medical literature, it is the job of the reader to remain skeptical. The rate of intercurrent deaths in a study should reflect the age and demographics of the study population. If the DSS is far superior to the OS, the population being studied may be unusually sick (and thus unrealistic), or there may be a bias in classifying the causes of death. Similarly, if the DSS and OS are identical (unless a highly virulent malignancy is being studied), it may suggest the researchers have only included an unusually healthy (and thus unrealistic) patient population. Finally, we would also be a bit suspicious of a sizeable series that did not have any deaths that were considered of "uncertain" cause, unless the researchers specifically included them as being due to the cancer. We honestly think that everybody has a few patients like Mr. B.
从一次医学会议回来后,我们悲伤地得知一位患者“B先生”去世了。他的死完全出乎意料。他在接受了针对局部晚期头颈癌的强化放疗九个月后情况一直良好;在他最近的随访记录中,他被描述为“完全缓解”。然而,他显然在一天夜里睡觉时因心脏骤停平静地离世,第二天被一位忧心的邻居发现。在我们不在的时候,B先生去世后,由一位并不详细了解他但知道他近期为何在我们科室接受治疗的医生填写了死亡证明。死亡原因列为头颈癌。在他去世后不久,我们就开始收到那些臭名昭著的“补充信息请求”,来自一个知名合作组统计办公室的信件。原来,B先生参与了一项临床试验,了解他去世情况的更多细节“至关重要”。也许这个非常大的癌症已经得到控制,B先生死于老年(烟草行业也起到了一定作用)。另一方面,也许他颈部残留的“纤维化”实际上布满了活跃的肿瘤,他的左颈动脉最终被完全堵塞,或者也许他因癌症相关的高凝状态而发生了大面积肺栓塞。我们在表格和请求中简洁地填写了“死亡原因不明”,并补充道,“请让研究负责人打电话来讨论这个疑难病例”。
临床结局报告通常会使用并强调终点指标“疾病特异性生存”(DSS)。与总生存(OS)一样,DSS可以通过精算方法计算,对随访不完整的患者在最后一次随访时进行“截尾”,等待进一步信息。然而,在DSS中,与所关注的索引癌症无关的死亡在死亡时被截尾;因此,因并发疾病导致的死亡被视为“成功案例”(对研究者来说是这样;显然,对患者及其家人并非如此)。DSS率总是会高于OS率。显然,对于任何OS曲线,如果等待足够长的时间,它最终会降至零。因此,单独报告DSS有非常合理的依据,特别是在并发疾病导致死亡预计很常见的疾病中。分析DSS可以让研究人员更好地比较两种或更多癌症治疗方法的生物学疗效,因为它不一定会降至零。与其他一些终点指标不同,包括局部区域控制或无进展生存期,DSS考虑了挽救治疗的可能性。DSS还关注了公众所关心的一个终点指标——死于癌症。在最近一项大众媒体调查中,当被问及如果可以选择将如何离世时,0%的人选择癌症。
然而,过度依赖DSS存在两个严重的潜在问题。首先,由于死于并发疾病的患者被视为“治愈”,这严重夸大了癌症治疗的表观有效性。对于相同的生物学疾病和相同的治疗方法,在并发死亡风险高的老年、患病群体中计算出的DSS会优于主要风险来自癌症的年轻、健康群体中的DSS。关于早期前列腺癌已经讨论过这个问题,其中观察的保守方法受到了批评。相关研究严重依赖DSS,表明“观察等待”与其他研究者积极治疗的结果相比,DSS相当(10年时为90%)。问题在于,这些保守治疗系列关注的是具有高竞争死亡原因风险的患者群体(与个体相对),这与通常接受积极治疗的患者群体非常不同(其中一些患者的总生存期优于年龄匹配的对照组)。将非随机观察系列与积极治疗系列进行比较是错误且不合逻辑的。
除了上述问题,使用DSS还引入了另一个潜在问题,我们称之为死因解释偏差。除了OS之外,所有统计终点指标(例如缓解率、局部区域控制、无脑转移生存期)都严重依赖于定义终点的方法,并且常常存在显著的观察者间差异。没有理由相信在将死亡定义为由于索引癌症或并发疾病方面这个问题不会偶尔出现,尽管这个问题研究得很少。在许多肿瘤学情况中——例如转移性肺癌——这种偏差形式并不存在。在某些情况中,如头颈癌,这可能是一个中间问题(那个致命的胸部肿瘤是第二原发癌还是转移癌?如果他还有舌头,会发生致命的吸入性肺炎吗?还有上面提到的B先生的情况呢?)。在某些情况中,特别是相对“预后良好”的肿瘤,这可能是一个严重问题,特别是如果对死亡是否与癌症相关的判定仅仅由那些“有兴趣”展示良好DSS的研究者进行。我们最担心的这种偏差形式与近期的观察系列有关,比如早期前列腺癌。有趣的是,尽管只有10%的“观察”患者死于前列腺癌,但到10年时许多患者出现了远处转移(中度分级肿瘤患者中约40%)。因此,这意味着许多前列腺癌转移本身通常并不致命,这对于任何照顾前列腺癌患者的人来说都是一种误解。这与美国转移性前列腺癌的研究不一致,其中位生存期为两到三年。在“观察等待”系列中,许多归因于并发疾病的死亡实际上可能与前列腺癌相关,可能与身体衰弱、泌尿道感染或肺栓塞有关。如果不独立审查个体患者的病历,我们将无法得知;在某些情况下,即使是最详细的审查,有时甚至尸检也不会得出结论。
关于死因解释偏差所造成的问题,仅有少量数据可供参考。一项小型研究,仅以摘要形式呈现,评估了50名随机选择的死亡前列腺癌患者的死因。邀请了五位前列腺癌专家将死因判定为是否由于前列腺癌。在五位评审者中,DSS从21%到35%不等,相对差异为66%。在美国现在很少进行的尸检研究表明,临床医生偶尔会漏诊致命的恶性肿瘤,更令人警醒的是,偶尔会发现一名被认为死于转移性癌症的患者实际上没有肿瘤,而是死于“良性”原因,如肺结核。一项研究表明错误率约为8%。
显然,DSS的使用将会持续存在,并且在分析随机试验时是OS的有用辅助指标。需要对DSS的有效性和观察者间可重复性进行更多研究。与此同时,研究者在报告DSS时不应不报告OS,并且应描述并发死亡的原因——同行评审者应强制执行这一点。与医学文献中统计的许多其他问题一样,读者的职责是保持怀疑态度。研究中的并发死亡率应反映研究人群的年龄和人口统计学特征。如果DSS远优于OS,所研究的人群可能异常患病(因此不现实),或者在死因分类上可能存在偏差。同样,如果DSS和OS相同(除非研究的是高侵袭性恶性肿瘤),这可能表明研究者只纳入了异常健康(因此不现实)的患者群体。最后,我们也会对一个没有任何被认为“原因不明”死亡的大量系列研究有所怀疑,除非研究者特别将其列为由于癌症导致的死亡。我们真心认为每个人都有一些像B先生这样的患者。