Department of Orthopaedic Surgery, Vanderbilt University Medical Center, Nashville, TN, USA.
Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.
Clin Orthop Relat Res. 2023 Mar 1;481(3):491-508. doi: 10.1097/CORR.0000000000002282. Epub 2022 Jun 21.
Large national databases have become a common source of information on patterns of cancer care in the United States, particularly for low-incidence diseases such as sarcoma. Although aggregating information from many hospitals can achieve statistical power, this may come at a cost when complex variables must be abstracted from the medical record. There is a current lack of understanding of the frequency of use of the Surveillance, Epidemiology, and End Results (SEER) database and the National Cancer Database (NCDB) over the last two decades in musculoskeletal sarcoma research and whether their use tends to produce papers with conflicting findings.
QUESTIONS/PURPOSES: (1) Is the number of published studies using the SEER and NCDB databases in musculoskeletal sarcoma research increasing over time? (2) What are the author, journal, and content characteristics of these studies? (3) Do studies using the SEER and the NCDB databases for similar diagnoses and study questions report concordant or discordant key findings? (4) Are the administrative data reported by our institution to the SEER and the NCDB databases concordant with the data in our longitudinally maintained, physician-run orthopaedic oncology dataset?
To answer our first three questions, PubMed was searched from 2001 through 2020 for all studies using the SEER or the NCDB databases to evaluate sarcoma. Studies were excluded from the review if they did not use these databases or studied anatomic locations other than the extremities, nonretroperitoneal pelvis, trunk, chest wall, or spine. To answer our first question, the number of SEER and NCDB studies were counted by year. The publication rate over the 20-year span was assessed with simple linear regression modeling. The difference in the mean number of studies between 5-year intervals (2001-2005, 2006-2010, 2011-2015, 2016-2020) was also assessed with Student t-tests. To answer our second question, we recorded and summarized descriptive data regarding author, journal, and content for these studies. To answer our third question, we grouped all studies by diagnosis, and then identified studies that shared the same diagnosis and a similar major study question with at least one other study. We then categorized study questions (and their associated studies) as having concordant findings, discordant findings, or mixed findings. Proportions of studies with concordant, discordant, or mixed findings were compared. To answer our fourth question, a coding audit was performed assessing the concordance of nationally reported administrative data from our institution with data from our longitudinally maintained, physician-run orthopaedic oncology dataset in a series of patients during the past 3 years. Our orthopaedic oncology dataset is maintained on a weekly basis by the senior author who manually records data directly from the medical record and sarcoma tumor board consensus notes; this dataset served as the gold standard for data comparison. We compared date of birth, surgery date, margin status, tumor size, clinical stage, and adjuvant treatment.
The number of musculoskeletal sarcoma studies using the SEER and the NCDB databases has steadily increased over time in a linear regression model (β = 2.51; p < 0.001). The mean number of studies per year more than tripled during 2016-2020 compared with 2011-2015 (39 versus 13 studies; mean difference 26 ± 11; p = 0.03). Of the 299 studies in total, 56% (168 of 299) have been published since 2018. Nineteen institutions published more than five studies, and the most studies from one institution was 13. Orthopaedic surgeons authored 35% (104 of 299) of studies, and medical oncology journals published 44% (130 of 299). Of the 94 studies (31% of total [94 of 299]) that shared a major study question with at least one other study, 35% (33 of 94) reported discordant key findings, 29% (27 of 94) reported mixed key findings, and 44% (41 of 94) reported concordant key findings. Both concordant and discordant groups included papers on prognostic factors, demographic factors, and treatment strategies. When we compared nationally reported administrative data from our institution with our orthopaedic oncology dataset, we found clinically important discrepancies in adjuvant treatment (19% [15 of 77]), tumor size (21% [16 of 77]), surgery date (23% [18 of 77]), surgical margins (38% [29 of 77]), and clinical stage (77% [59 of 77]).
Appropriate use of databases in musculoskeletal cancer research is essential to promote clear interpretation of findings, as almost two-thirds of studies we evaluated that asked similar study questions produced discordant or mixed key findings. Readers should be mindful of the differences in what each database seeks to convey because asking the same questions of different databases may result in different answers depending on what information each database captures. Likewise, differences in how studies determine which patients to include or exclude, how they handle missing data, and what they choose to emphasize may result in different messages getting drawn from large-database studies. Still, given the rarity and heterogeneity of sarcomas, these databases remain particularly useful in musculoskeletal cancer research for nationwide incidence estimations, risk factor/prognostic factor assessment, patient demographic and hospital-level variable assessment, patterns of care over time, and hypothesis generation for future prospective studies.
Level III, therapeutic study.
大型国家数据库已成为美国癌症治疗模式的常见信息来源,尤其是肉瘤等低发病率疾病。尽管从许多医院汇总信息可以获得统计能力,但当必须从病历中提取复杂变量时,这可能会带来成本。目前,人们对过去 20 年来在肌肉骨骼肉瘤研究中使用监测、流行病学和最终结果 (SEER) 数据库和国家癌症数据库 (NCDB) 的频率以及它们的使用是否倾向于产生相互矛盾的研究结果缺乏了解。
问题/目的:(1) 发表的使用 SEER 和 NCDB 数据库的肌肉骨骼肉瘤研究的数量是否随着时间的推移而增加?(2) 这些研究的作者、期刊和内容特征是什么?(3) 使用 SEER 和 NCDB 数据库进行类似诊断和研究问题的研究报告的关键发现是否一致或不一致?(4) 我们机构向 SEER 和 NCDB 数据库报告的行政数据是否与我们纵向维护的、由医生管理的骨科肿瘤数据集一致?
为了回答我们的前三个问题,我们从 2001 年到 2020 年在 PubMed 上搜索了所有使用 SEER 或 NCDB 数据库评估肉瘤的研究。如果这些研究没有使用这些数据库或研究的解剖部位不是四肢、腹膜后骨盆、躯干、胸壁或脊柱,则将这些研究排除在综述之外。为了回答我们的第一个问题,我们通过年份计算了 SEER 和 NCDB 研究的数量。使用简单线性回归模型评估 20 年跨度内的发表率。还使用学生 t 检验评估了 5 年间隔(2001-2005、2006-2010、2011-2015、2016-2020)之间平均研究数量的差异。为了回答我们的第二个问题,我们记录并总结了这些研究的作者、期刊和内容的描述性数据。为了回答我们的第三个问题,我们根据诊断对所有研究进行分组,然后确定具有相同诊断和至少一项其他研究相同主要研究问题的研究。然后,我们将研究问题(及其相关研究)分为具有一致发现、不一致发现或混合发现的研究。比较了具有一致、不一致或混合发现的研究比例。为了回答我们的第四个问题,对我们机构的全国性报告行政数据与过去 3 年期间我们纵向维护的、由医生管理的骨科肿瘤数据集进行了编码审核。我们的骨科肿瘤数据集由资深作者每周维护,他直接从病历和肉瘤肿瘤委员会共识笔记中手动记录数据;该数据集是数据比较的金标准。我们比较了出生日期、手术日期、边缘状态、肿瘤大小、临床分期和辅助治疗。
在 2016-2020 年期间,肌肉骨骼肉瘤研究中使用 SEER 和 NCDB 数据库的数量在线性回归模型中呈稳步增加(β=2.51;p<0.001)。与 2011-2015 年相比,2016-2020 年期间每年发表的研究数量增加了两倍以上(39 项与 13 项研究;平均差异 26±11;p=0.03)。在总共 299 项研究中,19%(168 项)是自 2018 年以来发表的。19 个机构发表了超过五项研究,一个机构发表了 13 项研究。骨科医生撰写了 35%(299 项中的 104 项)的研究,医学肿瘤学杂志发表了 44%(299 项中的 130 项)。在 94 项(299 项的 31%)具有至少一项其他研究相同主要研究问题的研究中,35%(94 项中的 33 项)报告了不一致的关键发现,29%(94 项中的 27 项)报告了混合关键发现,44%(94 项中的 41 项)报告了一致的关键发现。一致和不一致的研究均包括关于预后因素、人口统计学因素和治疗策略的论文。当我们将我们机构的全国性报告行政数据与我们的骨科肿瘤数据集进行比较时,我们发现辅助治疗(19%[77 项中的 15 项])、肿瘤大小(21%[77 项中的 16 项])、手术日期(23%[77 项中的 18 项])、手术边缘(38%[77 项中的 29 项])和临床分期(77%[77 项中的 59 项])存在显著差异。
在肌肉骨骼癌症研究中,适当使用数据库对于明确解释研究结果至关重要,因为我们评估的近三分之二的研究提出了相似的研究问题,但产生了不一致或混合的关键发现。读者应注意每个数据库试图传达的差异,因为向不同的数据库提出相同的问题可能会因每个数据库捕获的信息不同而导致不同的答案。同样,研究在确定纳入或排除哪些患者、如何处理缺失数据以及强调哪些内容方面的差异可能会导致从大型数据库研究中得出不同的结论。尽管肉瘤的发病率和异质性很高,但这些数据库在肌肉骨骼癌症研究中仍然非常有用,可以进行全国范围内的发病率估计、风险因素/预后因素评估、患者人口统计学和医院级别的变量评估、随时间推移的治疗模式以及未来前瞻性研究的假设生成。
三级,治疗性研究。