SIFT Indel:蛋白质中氨基酸插入/缺失功能效应的预测。
SIFT Indel: predictions for the functional effects of amino acid insertions/deletions in proteins.
作者信息
Hu Jing, Ng Pauline C
机构信息
Department of Mathematics and Computer Science, Franklin and Marshall College, Lancaster, Pennsylvania, United States of America.
出版信息
PLoS One. 2013 Oct 23;8(10):e77940. doi: 10.1371/journal.pone.0077940. eCollection 2013.
Indels in the coding regions of a gene can either cause frameshifts or amino acid insertions/deletions. Frameshifting indels are indels that have a length that is not divisible by 3 and subsequently cause frameshifts. Indels that have a length divisible by 3 cause amino acid insertions/deletions or block substitutions; we call these 3n indels. The new amino acid changes resulting from 3n indels could potentially affect protein function. Therefore, we construct a SIFT Indel prediction algorithm for 3n indels which achieves 82% accuracy, 81% sensitivity, 82% specificity, 82% precision, 0.63 MCC, and 0.87 AUC by 10-fold cross-validation. We have previously published a prediction algorithm for frameshifting indels. The rules for the prediction of 3n indels are different from the rules for the prediction of frameshifting indels and reflect the biological differences of these two different types of variations. SIFT Indel was applied to human 3n indels from the 1000 Genomes Project and the Exome Sequencing Project. We found that common variants are less likely to be deleterious than rare variants. The SIFT indel prediction algorithm for 3n indels is available at http://sift-dna.org/
基因编码区的插入缺失可导致移码或氨基酸插入/缺失。移码插入缺失是指长度不能被3整除的插入缺失,进而导致移码。长度能被3整除的插入缺失会导致氨基酸插入/缺失或阻断替换;我们将这些称为3n插入缺失。3n插入缺失导致的新氨基酸变化可能会影响蛋白质功能。因此,我们构建了一种针对3n插入缺失的SIFT插入缺失预测算法,通过10折交叉验证,该算法的准确率达到82%,灵敏度达到81%,特异性达到82%,精确率达到82%,马修斯相关系数为0.63,曲线下面积为0.87。我们之前发表过一种针对移码插入缺失的预测算法。3n插入缺失的预测规则与移码插入缺失的预测规则不同,反映了这两种不同类型变异的生物学差异。SIFT插入缺失算法已应用于千人基因组计划和外显子测序计划中的人类3n插入缺失。我们发现,常见变异比罕见变异更不容易有害。针对3n插入缺失的SIFT插入缺失预测算法可在http://sift-dna.org/获取。