Xu Yanxun, Zheng Xiaofeng, Yuan Yuan, Estecio Marcos R, Issa Jean-Pierre, Ji Yuan, Liang Shoudan
Department of Statistics, Rice University Houston, TX.
Department of Bioinformatics and Computational Biology, The University of Texas, MD Anderson Cancer Center Houston, TX.
IEEE Int Workshop Genomic Signal Process Stat. 2012 Dec;2012:42-45. doi: 10.1109/GENSIPS.2012.6507722.
A single-nucleotide polymorphism (SNP) is a single base change in the DNA sequence and is the most common polymorphism. Since some SNPs have a major influence on disease susceptibility, detecting SNPs plays an important role in biomedical research. To take fully advantage of the next-generation sequencing (NGS) technology and detect SNP more effectively, we propose a Bayesian approach that computes a posterior probability of hidden nucleotide variations at each covered genomic position. The position with higher posterior probability of hidden nucleotide variation has a higher chance to be a SNP. We apply the proposed method to detect SNPs in two cell lines: the prostate cancer cell line PC3 and the embryonic stem cell line H1. A comparison between our results with dbSNP database shows a high ratio of overlap (>95%). The positions that are called only under our model but not in dbSNP may serve as candidates for new SNPs.
单核苷酸多态性(SNP)是DNA序列中的单个碱基变化,是最常见的多态性。由于一些SNP对疾病易感性有重大影响,检测SNP在生物医学研究中起着重要作用。为了充分利用下一代测序(NGS)技术并更有效地检测SNP,我们提出了一种贝叶斯方法,该方法可计算每个覆盖的基因组位置上隐藏核苷酸变异的后验概率。隐藏核苷酸变异后验概率较高的位置成为SNP的机会更大。我们应用所提出的方法在两种细胞系中检测SNP:前列腺癌细胞系PC3和胚胎干细胞系H1。我们的结果与dbSNP数据库之间的比较显示出高重叠率(>95%)。仅在我们的模型下被调用而不在dbSNP中被调用的位置可能作为新SNP的候选者。