Konovalov Dmitry A, Bajema Nigel, Litow Bruce
School of Information Technology, James Cook University Townsville, QLD, Australia.
Bioinformatics. 2005 Oct 15;21(20):3912-7. doi: 10.1093/bioinformatics/bti642. Epub 2005 Aug 23.
The problem of reconstructing full sibling groups from DNA marker data remains a significant challenge for computational biology. A recently published heuristic algorithm based on Mendelian exclusion rules and the Simpson index was successfully applied to the full sibship reconstruction (FSR) problem. However, the so-called SIMPSON algorithm has an unknown complexity measure, questioning its applicability range.
We present a modified version of the SIMPSON (MS) algorithm that behaves as O(n(3)) and achieves the same or better accuracy when compared with the original algorithm. Performance of the MS algorithm was tested on a variety of simulated diploid population samples to verify its complexity measure and the significant improvement in efficiency (e.g. 100 times faster than SIMPSON in some cases). It has been shown that, in theory, the SIMPSON algorithm runs in non-polynomial time, significantly limiting its usefulness. It has been also verified via simulation experiments that SIMPSON could run in O(n(a)), where a > 3.
Computer code written in Java is available upon request from the first author.
从DNA标记数据重建全同胞组的问题仍然是计算生物学面临的重大挑战。最近发表的一种基于孟德尔排除规则和辛普森指数的启发式算法已成功应用于全同胞关系重建(FSR)问题。然而,所谓的辛普森算法具有未知的复杂度度量,这对其适用范围提出了质疑。
我们提出了辛普森算法的改进版本(MS算法),其时间复杂度为O(n(3)),与原始算法相比,具有相同或更高的准确性。在各种模拟二倍体群体样本上测试了MS算法的性能,以验证其复杂度度量以及效率的显著提高(例如,在某些情况下比辛普森算法快100倍)。结果表明,理论上辛普森算法运行在非多项式时间内,这极大地限制了其用途。通过模拟实验还验证了辛普森算法可能运行在O(n(a)),其中a > 3。
可根据第一作者的要求提供用Java编写的计算机代码。