Anggreainy Maria Susan, Widyanto M Rahmat, Widjaja Belawati H, Soedarsono Nurtami
Faculty of Computer Science, Universitas Indonesia, Depok Campus, West Java 16424, Indonesia.
Faculty of Dentistry, Universitas Indonesia, Salemba Campus, Jakarta 10430, Indonesia.
Adv Bioinformatics. 2018 Jul 29;2018:8602513. doi: 10.1155/2018/8602513. eCollection 2018.
We performed locus similarity calculation by measuring fuzzy intersection between individual locus and reference locus and then performed CODIS STR-DNA similarity calculation. The fuzzy intersection calculation enables a more robust CODIS STR-DNA similarity calculation due to imprecision caused by noise produced by PCR machine. We also proposed shifted convoluted Gaussian fuzzy number (SCGFN) and Gaussian fuzzy number (GFN) to represent each locus value as improvement of triangular fuzzy number (TFN) as used in previous research. Compared to triangular fuzzy number (TFN), GFN is more realistic to represent uncertainty of locus information because the distribution is assumed to be Gaussian. Then, the original Gaussian fuzzy number (GFN) is convoluted with distribution of certain ethnic locus information to produce the new SCGFN which more represents ethnic information compared to original GFN. Experiments were done for the following cases: people with family relationships, people of the same tribe, and certain tribal populations. The statistical test with analysis of variance (ANOVA) shows the difference in similarity between SCGFN, GFN, and TFN with a significant level of 95%. The Tukey method in ANOVA shows that SCGFN yields a higher similarity which means being better than the GFN and TFN methods. The proposed method enables CODIS STR-DNA similarity calculation which is more robust to noise and performed better CODIS similarity calculation involving familial and tribal relationships.
我们通过测量个体基因座与参考基因座之间的模糊交集来进行基因座相似度计算,然后进行联合DNA索引系统(CODIS)短串联重复序列(STR)-DNA相似度计算。由于聚合酶链式反应(PCR)机器产生的噪声导致的不精确性,模糊交集计算能够实现更稳健的CODIS STR-DNA相似度计算。我们还提出了移位卷积高斯模糊数(SCGFN)和高斯模糊数(GFN),以表示每个基因座值,作为对先前研究中使用的三角模糊数(TFN)的改进。与三角模糊数(TFN)相比,GFN在表示基因座信息的不确定性方面更现实,因为其分布假定为高斯分布。然后,将原始高斯模糊数(GFN)与特定种族基因座信息的分布进行卷积,以产生新的SCGFN,与原始GFN相比,SCGFN更能代表种族信息。针对以下情况进行了实验:有亲属关系的人、同一部落的人以及特定部落群体。采用方差分析(ANOVA)的统计检验显示,SCGFN、GFN和TFN之间的相似度存在差异,显著性水平为95%。方差分析中的Tukey方法表明,SCGFN产生的相似度更高,这意味着它比GFN和TFN方法更好。所提出的方法能够实现对噪声更稳健的CODIS STR-DNA相似度计算,并且在涉及家族和部落关系的CODIS相似度计算中表现更好。