Mushlin Richard A, Gallagher Stephen, Kershenbaum Aaron, Rebbeck Timothy R
PsychoGenics Inc, Tarrytown, NY, USA.
PLoS One. 2009;4(3):e4862. doi: 10.1371/journal.pone.0004862. Epub 2009 Mar 16.
Commonly-occurring disease etiology may involve complex combinations of genes and exposures resulting in etiologic heterogeneity. We present a computational algorithm that employs clique-finding for heterogeneity and multidimensionality in biomedical and epidemiological research (the "CHAMBER" algorithm).
METHODOLOGY/PRINCIPAL FINDINGS: This algorithm uses graph-building to (1) identify genetic variants that influence disease risk and (2) predict individuals at risk for disease based on inherited genotype. We use a set-covering algorithm to identify optimal cliques and a Boolean function that identifies etiologically heterogeneous groups of individuals. We evaluated this approach using simulated case-control genotype-disease associations involving two- and four-gene patterns. The CHAMBER algorithm correctly identified these simulated etiologies. We also used two population-based case-control studies of breast and endometrial cancer in African American and Caucasian women considering data on genotypes involved in steroid hormone metabolism. We identified novel patterns in both cancer sites that involved genes that sulfate or glucuronidate estrogens or catecholestrogens. These associations were consistent with the hypothesized biological functions of these genes. We also identified cliques representing the joint effect of multiple candidate genes in all groups, suggesting the existence of biologically plausible combinations of hormone metabolism genes in both breast and endometrial cancer in both races.
The CHAMBER algorithm may have utility in exploring the multifactorial etiology and etiologic heterogeneity in complex disease.
常见疾病的病因可能涉及基因与暴露因素的复杂组合,从而导致病因的异质性。我们提出了一种计算算法,该算法在生物医学和流行病学研究中利用团发现来处理异质性和多维性(“CHAMBER”算法)。
方法/主要发现:该算法使用构建图来(1)识别影响疾病风险的基因变异,以及(2)根据遗传基因型预测有疾病风险的个体。我们使用集合覆盖算法来识别最优团,并使用布尔函数来识别病因上异质的个体组。我们使用涉及双基因和四基因模式的模拟病例对照基因型-疾病关联来评估这种方法。CHAMBER算法正确识别了这些模拟病因。我们还对非裔美国人和白人女性的乳腺癌和子宫内膜癌进行了两项基于人群的病例对照研究,考虑了类固醇激素代谢相关的基因型数据。我们在这两种癌症部位都发现了新的模式,这些模式涉及硫酸化或葡萄糖醛酸化雌激素或儿茶酚雌激素的基因。这些关联与这些基因的假设生物学功能一致。我们还在所有组中识别出代表多个候选基因联合效应的团,这表明在两个种族的乳腺癌和子宫内膜癌中都存在激素代谢基因的生物学上合理的组合。
CHAMBER算法可能有助于探索复杂疾病的多因素病因和病因异质性。