Laboratory of Computational Biology, Centre for DNA Fingerprinting & Diagnostics, Bldg7, Gruhakalpa, Nampally, Hyderabad, 500001, Andhra Pradesh, India.
Hum Mutat. 2012 Feb;33(2):332-7. doi: 10.1002/humu.21642. Epub 2011 Dec 9.
Variations are mostly due to nonsynonymous single nucleotide polymorphisms (nsSNPs), some of which are associated with certain diseases. Phenotypic effects of a large number of nsSNPs have not been characterized. Although several methods have been developed to predict the effects of nsSNPs as "disease" or "neutral," there is still a need for development of methods with improved prediction accuracies. We, therefore, developed a support vector machine (SVM) based method named Hansa which uses a novel set of discriminatory features to classify nsSNPs into disease (pathogenic) and benign (neutral) types. Validation studies on a benchmark dataset and further on an independent dataset of well-characterized known disease and neutral mutations show that Hansa outperforms the other known methods. For example, fivefold cross-validation studies using the benchmark HumVar dataset reveal that at the false positive rate (FPR) of 20% Hansa yields a true positive rate (TPR) of 82% that is about 10% higher than the best-known method. Hansa is available in the form of a web server at http://hansa.cdfd.org.in:8080.
变异主要归因于非同义单核苷酸多态性(nsSNPs),其中一些与某些疾病有关。大量 nsSNP 的表型效应尚未得到描述。尽管已经开发了几种方法来预测 nsSNP 的“疾病”或“中性”效应,但仍需要开发具有更高预测准确性的方法。因此,我们开发了一种基于支持向量机(SVM)的方法,名为 Hansa,它使用一组新的鉴别特征将 nsSNP 分为疾病(致病性)和良性(中性)类型。在基准数据集上的验证研究以及在经过充分表征的已知疾病和中性突变的独立数据集上的进一步研究表明,Hansa 优于其他已知方法。例如,使用基准 HumVar 数据集进行的五倍交叉验证研究表明,在假阳性率(FPR)为 20%时,Hansa 的真阳性率(TPR)为 82%,比最知名的方法高约 10%。Hansa 可在 http://hansa.cdfd.org.in:8080 以网络服务器的形式使用。