Li Rong, Zhong Dexing, Liu Ruiling, Lv Hongqiang, Zhang Xinman, Liu Jun, Han Jiuqiang
Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an 710049, PR China.
Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an 710049, PR China.
J Theor Biol. 2017 Feb 21;415:84-89. doi: 10.1016/j.jtbi.2016.11.022. Epub 2016 Nov 29.
Regulatory single nucleotide polymorphisms (rSNPs), kind of functional noncoding genetic variants, can affect gene expression in a regulatory way, and they are thought to be associated with increased susceptibilities to complex diseases. Here a novel computational approach to identify potential rSNPs is presented. Different from most other rSNPs finding methods which based on hypothesis that SNPs causing large allele-specific changes in transcription factor binding affinities are more likely to play regulatory functions, we use a set of documented experimentally verified rSNPs and nonfunctional background SNPs to train classifiers, so the discriminating features are found. To characterize variants, an extensive range of characteristics, such as sequence context, DNA structure and evolutionary conservation etc. are analyzed. Support vector machine is adopted to build the classifier model together with an ensemble method to deal with unbalanced data. 10-fold cross-validation result shows that our method can achieve accuracy with sensitivity of ~78% and specificity of ~82%. Furthermore, our method performances better than some other algorithms based on aforementioned hypothesis in handling false positives. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnppredict/.
调控单核苷酸多态性(rSNP)是一类功能性非编码基因变异,能够以调控方式影响基因表达,并且被认为与复杂疾病易感性增加有关。本文提出了一种识别潜在rSNP的新型计算方法。与大多数其他rSNP发现方法不同,那些方法基于单核苷酸多态性(SNP)导致转录因子结合亲和力出现大的等位基因特异性变化更有可能发挥调控功能这一假设,我们使用一组已记录的经实验验证的rSNP和无功能背景SNP来训练分类器,从而找到判别特征。为了表征变异,我们分析了广泛的特征,如序列上下文、DNA结构和进化保守性等。采用支持向量机结合集成方法来构建分类器模型以处理不平衡数据。10折交叉验证结果表明,我们的方法能够实现准确率,灵敏度约为78%,特异性约为82%。此外,在处理误报方面,我们的方法比基于上述假设的其他一些算法表现更好。所涉及的原始数据和源Matlab代码可在https://sourceforge.net/projects/rsnppredict/获取。