基于 SNP、结构和功能注释的类风湿关节炎候选基因的多维筛选方法。

A towards-multidimensional screening approach to predict candidate genes of rheumatoid arthritis based on SNP, structural and functional annotations.

机构信息

Department of Biophysics, College of Bioinformatics Science and Technology; Harbin Medical University, Harbin, Hei Longjiang Province, China.

出版信息

BMC Med Genomics. 2010 Aug 20;3:38. doi: 10.1186/1755-8794-3-38.

Abstract

BACKGROUND

According to the Genetic Analysis Workshops (GAW), hundreds of thousands of SNPs have been tested for association with rheumatoid arthritis. Traditional genome-wide association studies (GWAS) have been developed to identify susceptibility genes using a "most significant SNPs/genes" model. However, many minor- or modest-risk genes are likely to be missed after adjustment of multiple testing. This screening process uses a strict selection of statistical thresholds that aim to identify susceptibility genes based only on statistical model, without considering multi-dimensional biological similarities in sequence arrangement, crystal structure, or functional categories/biological pathways between candidate and known disease genes.

METHODS

Multidimensional screening approaches combined with traditional statistical genetics methods can consider multiple biological backgrounds of genetic mutation, structural, and functional annotations. Here we introduce a newly developed multidimensional screening approach for rheumatoid arthritis candidate genes that considers all SNPs with nominal evidence of Bayesian association (BFLn > 0), and structural and functional similarities of corresponding genes or proteins.

RESULTS

Our multidimensional screening approach extracted all risk genes (BFLn > 0) by odd ratios of hypothesis H1 to H0, and determined whether a particular group of genes shared underlying biological similarities with known disease genes. Using this method, we found 6614 risk SNPs in our Bayesian screen result set. Finally, we identified 146 likely causal genes for rheumatoid arthritis, including CD4, FGFR1, and KDR, which have been reported as high risk factors by recent studies. We must denote that 790 (96.1%) of genes identified by GWAS could not easily be classified into related functional categories or biological processes associated with the disease, while our candidate genes shared underlying biological similarities (e.g. were in the same pathway or GO term) and contributed to disease etiology, but where common variations in each of these genes make modest contributions to disease risk. We also found 6141 risk SNPs that were too minor to be detected by conventional approaches, and associations between 58 candidate genes and rheumatoid arthritis were verified by literature retrieved from the NCBI PubMed module.

CONCLUSIONS

Our proposed approach to the analysis of GAW16 data for rheumatoid arthritis was based on an underlying biological similarities-based method applied to candidate and known disease genes. Application of our method could identify likely causal candidate disease genes of rheumatoid arthritis, and could yield biological insights that not detected when focusing only on genes that give the strongest evidence by multiple testing. We hope that our proposed method complements the "most significant SNPs/genes" model, and provides additional insights into the pathogenesis of rheumatoid arthritis and other diseases, when searching datasets for hundreds of genetic variances.

摘要

背景

根据遗传分析工作坊(GAW)的数据,已经对数以十万计的单核苷酸多态性(SNP)进行了与类风湿关节炎相关的关联分析。传统的全基因组关联研究(GWAS)已经发展到使用“最显著 SNP/基因”模型来识别易感基因。然而,在进行多次检验校正后,许多次要或适度风险的基因可能会被遗漏。这个筛选过程使用了严格的统计阈值选择,旨在仅根据统计模型识别易感基因,而不考虑候选基因和已知疾病基因在序列排列、晶体结构或功能类别/生物途径方面的多维生物学相似性。

方法

多维筛选方法结合传统的统计遗传学方法,可以考虑遗传突变、结构和功能注释的多个生物学背景。在这里,我们引入了一种新的类风湿关节炎候选基因多维筛选方法,该方法考虑了所有具有贝叶斯关联名义证据(BFLn > 0)的 SNP,以及相应基因或蛋白质的结构和功能相似性。

结果

我们的多维筛选方法通过假设 H1 与 H0 的比值提取了所有风险基因(BFLn > 0),并确定了一组特定的基因是否与已知疾病基因具有潜在的生物学相似性。使用这种方法,我们在贝叶斯筛选结果集中发现了 6614 个风险 SNP。最后,我们确定了 146 个可能与类风湿关节炎相关的候选基因,其中包括 CD4、FGFR1 和 KDR,这些基因已被最近的研究报告为高风险因素。我们必须指出,GWAS 鉴定的 790 个(96.1%)基因不容易被归类为与疾病相关的相关功能类别或生物学过程,而我们的候选基因具有潜在的生物学相似性(例如在同一途径或 GO 术语中),并有助于疾病病因,但这些基因中的每一个常见变异都对疾病风险有适度的贡献。我们还发现了 6141 个太小而无法用常规方法检测到的风险 SNP,并且从 NCBI PubMed 模块中检索到的文献验证了 58 个候选基因与类风湿关节炎之间的关联。

结论

我们提出的用于分析 GAW16 类风湿关节炎数据的方法是基于一种基于潜在生物学相似性的方法,应用于候选基因和已知疾病基因。应用我们的方法可以识别类风湿关节炎的候选疾病的可能致病基因,并提供在仅关注通过多次检验获得最强证据的基因时未检测到的生物学见解。我们希望我们提出的方法能够补充“最显著 SNP/基因”模型,并在搜索数百个遗传变异的数据集时为类风湿关节炎和其他疾病的发病机制提供额外的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/979d/2939610/54e85b71205a/1755-8794-3-38-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索