Wheless Lee, Mosley Dominique, Dochtermann Daniel, Pyarajan Saiju, Gonzalez Katlyn, Weiss Rachel, Maas Kyle, Zhang Siwei, Yao Lydia, Xu Yaomin, Madden Christopher, Ike Jacqueline, Smith Isabelle T, Grossarth Sarah, Wilson Otis, Hung Adriana, Fillmore Nathanael R, Brown Kevin, Landi Maria Teresa, Hartman Rebecca I
Tennessee Valley Healthcare System VA Medical Center, 719 Thompson Lane, Suite 26300, Nashville, TN, 37215, USA.
Division of Epidemiology, Vanderbilt University Medical Center Department of Medicine, Vanderbilt University Medical Center, Nashville, USA.
Arch Dermatol Res. 2025 Jan 24;317(1):308. doi: 10.1007/s00403-024-03780-w.
Cases for a disease can be defined broadly using diagnostic codes, or narrowly using gold-standard confirmation that often is not available in large administrative datasets. These different definitions can have significant impacts on the results and conclusions of studies. We conducted this study to assess how using melanoma phecodes versus histologic confirmation for invasive or in situ melanoma impacts the results of a genome-wide association study (GWAS) using the Million Veteran Program. Melanoma status was determined three ways: (1) by the presence of two or more phecodes, (2) histologically-confirmed invasive melanoma, and (3) histologically-confirmed melanoma in situ. We conducted a GWAS for variants with minor allele frequencies of 1% or greater. There were 45,665 cases in the phecode cohort, 5364 cases in the confirmed invasive melanoma cohort, and 4792 cases in the confirmed melanoma in situ cohort. There were 20,457 variants significant at the genome-wide level in the phecode cohort, 2582 in the invasive melanoma cohort, and 1989 in the melanoma in situ cohort. Most of the variants identified in the phecode cohort did not replicate in the histologically-confirmed cohorts. The different case definitions led to large differences in sample size and variants associated at the genome-wide level. Unvalidated and imprecise case definitions can lead to less accurate results. Investigators should use validated phenotypes when gold-standard definitions are not available.
某种疾病的病例可以通过诊断编码进行宽泛定义,或者通过金标准确认进行狭义定义,而金标准确认在大型管理数据集中往往无法获取。这些不同的定义可能会对研究结果和结论产生重大影响。我们开展这项研究,以评估使用黑色素瘤疾病编码(phecodes)而非组织学确认来定义侵袭性或原位黑色素瘤,对使用百万退伍军人计划进行的全基因组关联研究(GWAS)结果有何影响。黑色素瘤状态通过三种方式确定:(1)存在两个或更多疾病编码;(2)组织学确诊的侵袭性黑色素瘤;(3)组织学确诊的原位黑色素瘤。我们对次要等位基因频率为1%或更高的变体进行了GWAS。疾病编码队列中有45,665例病例,确诊侵袭性黑色素瘤队列中有5364例病例,确诊原位黑色素瘤队列中有4792例病例。疾病编码队列中有20,457个变体在全基因组水平上具有显著性,侵袭性黑色素瘤队列中有2582个,原位黑色素瘤队列中有1989个。在疾病编码队列中鉴定出的大多数变体在组织学确诊队列中无法重复出现。不同的病例定义导致样本量以及全基因组水平上相关变体存在很大差异。未经验证和不精确的病例定义可能会导致结果不够准确。当无法获得金标准定义时,研究人员应使用经过验证的表型。