Suppr超能文献

在疾病关联研究中,使用汉明距离作为单核苷酸多态性(SNP)集聚类和检验的信息。

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.

作者信息

Wang Charlotte, Kao Wen-Hsin, Hsiao Chuhsing Kate

机构信息

Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, 100, Taiwan.

Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, 100, Taiwan; Bioinformatics and Biostatistics Core, Division of Genomic Medicine, Research Center for Medical Excellence, National Taiwan University, Taipei, 100, Taiwan; Department of Public Health, National Taiwan University, Taipei, 100, Taiwan.

出版信息

PLoS One. 2015 Aug 24;10(8):e0135918. doi: 10.1371/journal.pone.0135918. eCollection 2015.

Abstract

The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.

摘要

高通量基因组数据的可得性给近期的基因关联研究带来了若干挑战,包括必须考虑的大量遗传变异以及统计分析中的计算复杂性。使用标记集研究(如单核苷酸多态性(SNP)集分析)来解决这些问题可能是一种有效的解决方案。为了构建SNP集,我们首先提出一种聚类算法,该算法采用汉明距离来衡量SNP基因型字符串之间的相似性,并评估给定的SNP或SNP集是否应聚类。然后可以基于这种距离度量构建一个树形图,并确定聚类的数量。利用得到的SNP集,我们接下来开发一种关联测试HDAT,以检查对感兴趣疾病的易感性。这种提议的测试基于汉明距离评估患病个体与正常个体之间的相似性是否不同于相同疾病状态的两个个体之间的相似性。在我们提出的方法中,只需要基因型信息。不需要推断单倍型,并且所考虑的SNP不需要位于附近区域。通过应用和模拟研究说明了所提出的聚类算法和关联测试。与其他现有方法相比,该聚类算法在识别包含具有相似效应的SNP的集合方面更快且更好。此外,模拟研究表明,所提出的测试对于包含很大比例中性SNP的SNP集效果良好。此外,在测试大量数据之前使用聚类算法可提高在确定易感遗传标记的遗传区域方面的知识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8579/4547758/4fb5216ca6e2/pone.0135918.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验