Department of Chemistry, University of California, Berkeley, CA 94720.
Department of Integrative Omics for Biomedical Sciences, Yonsei University Graduate School, Seoul, Korea.
Proc Natl Acad Sci U S A. 2018 Feb 6;115(6):1322-1327. doi: 10.1073/pnas.1717960115. Epub 2018 Jan 22.
Prevention and early intervention are the most effective ways of avoiding or minimizing psychological, physical, and financial suffering from cancer. However, such proactive action requires the ability to predict the individual's susceptibility to cancer with a measure of probability. Of the triad of cancer-causing factors (inherited genomic susceptibility, environmental factors, and lifestyle factors), the inherited genomic component may be derivable from the recent public availability of a large body of whole-genome variation data. However, genome-wide association studies have so far showed limited success in predicting the inherited susceptibility to common cancers. We present here a multiple classification approach for predicting individuals' inherited genomic susceptibility to acquire the most likely phenotype among a panel of 20 major common cancer types plus 1 "healthy" type by application of a supervised machine-learning method under competing conditions among the cohorts of the 21 types. This approach suggests that, depending on the phenotypes of 5,919 individuals of "white" ethnic population in this study, () the portion of the cohort of a cancer type who acquired the observed type due to mostly inherited genomic susceptibility factors ranges from about 33 to 88% (or its corollary: the portion due to mostly environmental and lifestyle factors ranges from 12 to 67%), and () on an individual level, the method also predicts individuals' inherited genomic susceptibility to acquire the other types ranked with associated probabilities. These probabilities may provide practical information for individuals, heath professionals, and health policymakers related to prevention and/or early intervention of cancer.
预防和早期干预是避免或最大限度减少癌症带来的心理、身体和经济痛苦的最有效方法。然而,这种积极的行动需要有能力以一定的概率预测个体对癌症的易感性。在致癌的三因素(遗传基因组易感性、环境因素和生活方式因素)中,遗传基因组成分可能可以从最近大量全基因组变异数据的公开可用性中推导出来。然而,全基因组关联研究在预测常见癌症的遗传易感性方面迄今为止取得的成功有限。我们在这里提出了一种多分类方法,通过应用监督机器学习方法,在 21 种类型的队列中竞争条件下,预测个体对获得 20 种主要常见癌症类型加 1 种“健康”类型的最可能表型的遗传基因组易感性。该方法表明,根据本研究中“白人”人群的 5919 名个体的表型,()由于主要遗传基因组易感因素而获得观察到的类型的癌症类型队列的部分比例范围为约 33%至 88%(或其推论:主要由于环境和生活方式因素的部分比例范围为 12%至 67%),()在个体水平上,该方法还预测个体获得其他类型的遗传基因组易感性,以及相关的概率排名。这些概率可能为个人、医疗保健专业人员和卫生政策制定者提供与癌症预防和/或早期干预相关的实用信息。