Department of Computer Science and Engineering, East West University Bangladesh, Dhaka, Bangladesh.
Department of Computer Science and Engineering, Notre Dame University Bangladesh, Dhaka, Bangladesh.
Interdiscip Sci. 2017 Dec;9(4):512-527. doi: 10.1007/s12539-016-0158-7. Epub 2016 Mar 28.
Damages or breaks in DNA may change the characteristics of genomes and causes various diseases. In this work we construct a system that incorporates the maximum likelihood-based probabilistic formula to assess the number of damages that have occurred in any DNA sequence. This approach has been progressively benchmarked by implementing simulated data sets so that the outcomes can be compared with a ground truth or reference value. At first the sequence data set order is checked through the statistical cumulative sum (STACUMSUM). The verified sequences are then estimated by prior and posterior probability to count the percentages of breaks and mutations. Maximum-likelihood estimation then finds out the exact numbers and positions of breaks and detections. In database manipulation, one factor that decides the orientation and order of the sequence is geometric distance between consecutive sequences. The geometric distance is measured for smooth representation of the genome or DNA sequences. Finally, we compared the performance of our system with DAMBE5: (A Comprehensive Software Package for Data Analysis in Molecular Biology and Evaluation), and in response to time and space complexity, StrucBreak is much faster and consumes much less space due to our algorithmic approaches.
DNA 的损伤或断裂可能会改变基因组的特征,导致各种疾病。在这项工作中,我们构建了一个系统,该系统结合了基于最大似然的概率公式来评估任何 DNA 序列中发生的损伤数量。通过实现模拟数据集,逐步对这种方法进行基准测试,以便可以将结果与真实值或参考值进行比较。首先,通过统计累积和 (STACUMSUM) 检查序列数据集的顺序。然后,通过先验和后验概率对经过验证的序列进行估计,以计算断裂和突变的百分比。最大似然估计然后找出断裂和检测的确切数量和位置。在数据库操作中,决定序列方向和顺序的一个因素是连续序列之间的几何距离。几何距离用于平滑表示基因组或 DNA 序列。最后,我们将我们的系统与 DAMBE5 的性能进行了比较:(用于分子生物学数据分析和评估的综合软件包),并且由于我们的算法方法,StrucBreak 在时间和空间复杂度方面的表现要快得多,消耗的空间也少得多。