Am J Epidemiol. 2021 Jun 1;190(6):962-976. doi: 10.1093/aje/kwaa234.
Epidemiologic studies often rely on questionnaire data, exposure measurement tools, and/or biomarkers to identify risk factors and the underlying carcinogenic processes. An emerging and promising complementary approach to investigate cancer etiology is the study of somatic "mutational signatures" that endogenous and exogenous processes imprint on the cellular genome. These signatures can be identified from a complex web of somatic mutations thanks to advances in DNA sequencing technology and analytical algorithms. This approach is at the core of the Sherlock-Lung study (2018-ongoing), a retrospective case-only study of over 2,000 lung cancers in never-smokers (LCINS), using different patterns of mutations observed within LCINS tumors to trace back possible exposures or endogenous processes. Whole genome and transcriptome sequencing, genome-wide methylation, microbiome, and other analyses are integrated with data from histological and radiological imaging, lifestyle, demographic characteristics, environmental and occupational exposures, and medical records to classify LCINS into subtypes that could reveal distinct risk factors. To date, we have received samples and data from 1,370 LCINS cases from 17 study sites worldwide and whole-genome sequencing has been completed on 1,257 samples. Here, we present the Sherlock-Lung study design and analytical strategy, also illustrating some empirical challenges and the potential for this approach in future epidemiologic studies.
流行病学研究通常依赖问卷调查数据、暴露测量工具和/或生物标志物来识别危险因素和潜在的致癌过程。一种新兴且有前途的补充方法是研究体细胞“突变特征”,这些特征是内源性和外源性过程在细胞基因组上留下的印记。由于 DNA 测序技术和分析算法的进步,可以从复杂的体细胞突变网络中识别这些特征。这种方法是 Sherlock-Lung 研究(2018 年至今)的核心,这是一项针对 2000 多名从不吸烟的肺癌患者(LCINS)的回顾性病例对照研究,利用在 LCINS 肿瘤中观察到的不同突变模式来追溯可能的暴露或内源性过程。全基因组和转录组测序、全基因组甲基化、微生物组和其他分析与来自组织学和放射影像学、生活方式、人口统计学特征、环境和职业暴露以及病历的数据相结合,将 LCINS 分为可能揭示不同危险因素的亚型。迄今为止,我们已经从全球 17 个研究地点收到了 1370 例 LCINS 病例的样本和数据,并且已经完成了 1257 个样本的全基因组测序。在这里,我们介绍了 Sherlock-Lung 研究设计和分析策略,还说明了一些经验挑战和这种方法在未来流行病学研究中的潜力。