Mauguen Audrey, Zabor Emily C, Thomas Nancy E, Berwick Marianne, Seshan Venkatraman E, Begg Colin B
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY.
Department of Dermatology, University of North Carolina, Chapel Hill, NC.
J Am Stat Assoc. 2017;112(517):54-63. doi: 10.1080/01621459.2016.1191499. Epub 2017 May 3.
We showcase a novel analytic strategy to identify sub-types of cancer that possess distinctive causal factors, i.e. sub-types that are "etiologically" distinct. The method involves the integrated analysis of two types of study design: an incident series of cases with double primary cancers with detailed information on tumor characteristics that can be used to define the sub-types; a case-series of incident cases with information on known risk factors that can be used to investigate the specific risk factors that distinguish the sub-types. The methods are applied to a rich melanoma dataset with detailed information on pathologic tumor factors, and comprehensive information on known genetic and environmental risk factors for melanoma. Identification of the optimal sub-typing solution is accomplished using a novel clustering analysis that seeks to maximize a measure that characterizes the distinctiveness of the distributions of risk factors across the sub-types and that is a function of the correlations of tumor factors in the case-specific tumor pairs. This analysis is challenged by the presence of extensive missing data. If successful, studies of this nature offer the opportunity for efficient study design to identify unknown risk factors whose effects are concentrated in defined sub-types.
我们展示了一种新颖的分析策略,用于识别具有独特因果因素的癌症亚型,即 “病因学上” 不同的亚型。该方法涉及对两种研究设计的综合分析:一种是具有双原发性癌症的病例发病系列,带有可用于定义亚型的肿瘤特征详细信息;另一种是具有已知风险因素信息的发病病例系列,可用于研究区分这些亚型的特定风险因素。这些方法应用于一个丰富的黑色素瘤数据集,该数据集包含病理肿瘤因素的详细信息以及黑色素瘤已知遗传和环境风险因素的全面信息。使用一种新颖的聚类分析来确定最佳亚型解决方案,该分析旨在最大化一个衡量指标,该指标表征风险因素在各亚型间分布的独特性,并且是特定病例肿瘤对中肿瘤因素相关性的函数。这种分析受到大量缺失数据的挑战。如果成功,此类研究为高效研究设计提供了机会,以识别其影响集中在特定亚型中的未知风险因素。