一种用于在癌症数据遗传分析中识别基因和基因水平SNP聚合体的贝叶斯方法。

A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data.

作者信息

Stingo Francesco C, Swartz Michael D, Vannucci Marina

机构信息

Department of Biostatistics, MD Anderson Cancer Center, 1400 Pressler St. Houston, TX 77030, USA.

Department of Biostatistics, UT School of Public Health, 1200 Pressler St. Houston, TX 77030, USA.

出版信息

Stat Interface. 2015;8(2):137-151. doi: 10.4310/SII.2015.v8.n2.a2.

DOI:10.4310/SII.2015.v8.n2.a2

PMID:28989562

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5630184/

Abstract

Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data. In this paper we propose a novel Bayesian modeling framework to identify molecular biomarkers for disease prediction. Our method combines pathway-based approaches with multiple SNP analyses of a specified region of interest. The model's development is motivated by SNP data from a lung cancer study. In our approach we define gene-level scores based on SNP allele frequencies and use a linear modeling setting to study the scores association to the observed phenotype. The basic idea behind the definition of gene-level scores is to weigh the SNPs within the gene according to their rarity, based on genotype frequencies expected under the Hardy-Weinberg equilibrium law. This results in scores giving more importance to the unusually low frequencies, i.e. to SNPs that might indicate peculiar genetic differences between subjects belonging to different groups. An additional feature of our approach is that we incorporate information on SNP-to-SNP associations into the model. In particular, we use network priors that model the linkage disequilibrium between SNPs. For posterior inference, we design a stochastic search method that identifies significant biomarkers (genes and SNPs) for disease prediction. We assess performances on simulated data and compare results to existing approaches. We then show the ability of the proposed methodology to detect relevant genes and associated SNPs in a lung cancer dataset.

摘要

诸如癌症等复杂疾病源于由多个单核苷酸多态性（SNP）组成的复杂病因，每个SNP对疾病的总体风险贡献较小。因此，许多研究人员已经超越了单SNP分析方法，而是专注于SNP组，例如通过分析单倍型。最近，基于通路的方法被提出来，这些方法利用关于基因功能的先验生物学知识来对全基因组关联研究（GWAS）数据进行更强大的分析。在本文中，我们提出了一种新颖的贝叶斯建模框架来识别用于疾病预测的分子生物标志物。我们的方法将基于通路的方法与对特定感兴趣区域的多个SNP分析相结合。该模型的开发是受肺癌研究的SNP数据驱动。在我们的方法中，我们基于SNP等位基因频率定义基因水平得分，并使用线性建模设置来研究得分与观察到的表型之间的关联。基因水平得分定义背后的基本思想是根据哈迪 - 温伯格平衡定律预期的基因型频率，根据SNP的稀有程度对基因内的SNP进行加权。这导致得分更重视异常低的频率，即可能表明属于不同组的个体之间存在特殊遗传差异的SNP。我们方法的另一个特点是我们将SNP与SNP关联的信息纳入模型。特别是，我们使用对SNP之间的连锁不平衡进行建模的网络先验。对于后验推断，我们设计了一种随机搜索方法，用于识别用于疾病预测的显著生物标志物（基因和SNP）。我们在模拟数据上评估性能，并将结果与现有方法进行比较。然后，我们展示了所提出方法在肺癌数据集中检测相关基因和相关SNP的能力。

相似文献

A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data.

Stat Interface. 2015;8(2):137-151. doi: 10.4310/SII.2015.v8.n2.a2.

Selecting Closely-Linked SNPs Based on Local Epistatic Effects for Haplotype Construction Improves Power of Association Mapping.

G3 (Bethesda). 2019 Dec 3;9(12):4115-4126. doi: 10.1534/g3.119.400451.

Accuracy of prediction of simulated polygenic phenotypes and their underlying quantitative trait loci genotypes using real or imputed whole-genome markers in cattle.

Genet Sel Evol. 2015 Dec 23;47:99. doi: 10.1186/s12711-015-0179-4.

Novel Polymorphisms in Gene Associated with Egg-Laying Rate in Chinese Jing Hong Chicken using Genome-Wide SNP Scan.

Genes (Basel). 2019 May 20;10(5):384. doi: 10.3390/genes10050384.

Mixture SNPs effect on phenotype in genome-wide association studies.

BMC Genomics. 2015 Feb 3;16(1):3. doi: 10.1186/1471-2164-16-3.

Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.

Insights from genome-wide approaches to identify variants associated to phenotypes at pan-genome scale: Application to L. monocytogenes' ability to grow in cold conditions.

Int J Food Microbiol. 2019 Feb 16;291:181-188. doi: 10.1016/j.ijfoodmicro.2018.11.028. Epub 2018 Nov 29.

Bayesian methods for multivariate modeling of pleiotropic SNP associations and genetic risk prediction.

Front Genet. 2012 Sep 11;3:176. doi: 10.3389/fgene.2012.00176. eCollection 2012.

Informative SNP selection methods based on SNP prediction.

IEEE Trans Nanobioscience. 2007 Mar;6(1):60-7. doi: 10.1109/tnb.2007.891901.

[Preliminary study on single nucleotide polymorphisms and linkage disequilibrium in promoter region of fibrinogen B beta gene].

Zhonghua Yi Xue Yi Chuan Xue Za Zhi. 2003 Dec;20(6):512-6.

引用本文的文献

A Bayesian Approach for Estimating Dynamic Functional Network Connectivity in fMRI Data.

J Am Stat Assoc. 2018;113(521):134-151. doi: 10.1080/01621459.2017.1379404. Epub 2018 May 16.

本文引用的文献

Association study of 83 candidate genes for bipolar disorder in chromosome 6q selected using an evidence-based prioritization algorithm.

Am J Med Genet B Neuropsychiatr Genet. 2013 Dec;162B(8):898-906. doi: 10.1002/ajmg.b.32200. Epub 2013 Sep 30.

INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES.

Ann Appl Stat. 2011 Sep 1;5(3):1978-2002. doi: 10.1214/11-AOAS463.

Investigating multiple candidate genes and nutrients in the folate metabolism pathway to detect genetic and nutritional risk factors for lung cancer.

PLoS One. 2013;8(1):e53475. doi: 10.1371/journal.pone.0053475. Epub 2013 Jan 23.

A pathway analysis method for genome-wide association studies.

Stat Med. 2012 May 10;31(10):988-1000. doi: 10.1002/sim.4477. Epub 2012 Feb 3.

Rare and common variants: twenty arguments.

Nat Rev Genet. 2012 Jan 18;13(2):135-45. doi: 10.1038/nrg3118.

Molecular identification of hydroxylysine kinase and of ammoniophospholyases acting on 5-phosphohydroxy-L-lysine and phosphoethanolamine.

J Biol Chem. 2012 Mar 2;287(10):7246-55. doi: 10.1074/jbc.M111.323485. Epub 2012 Jan 12.

The mystery of missing heritability: Genetic interactions create phantom heritability.

Proc Natl Acad Sci U S A. 2012 Jan 24;109(4):1193-8. doi: 10.1073/pnas.1119675109. Epub 2012 Jan 5.

A large candidate gene survey identifies the KCNE1 D85N polymorphism as a possible modulator of drug-induced torsades de pointes.

Circ Cardiovasc Genet. 2012 Feb 1;5(1):91-9. doi: 10.1161/CIRCGENETICS.111.960930. Epub 2011 Nov 18.

A case-control study of a sex-specific association between a 15q25 variant and lung cancer risk.

Cancer Epidemiol Biomarkers Prev. 2011 Dec;20(12):2603-9. doi: 10.1158/1055-9965.EPI-11-0749. Epub 2011 Oct 25.

Incorporating model uncertainty in detecting rare variants: the Bayesian risk index.

Genet Epidemiol. 2011 Nov;35(7):638-49. doi: 10.1002/gepi.20613. Epub 2011 Aug 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于在癌症数据遗传分析中识别基因和基因水平SNP聚合体的贝叶斯方法。

A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献