Deorowicz Sebastian, Grabowski Szymon, Ochoa Idoia, Hernaez Mikel, Weissman Tsachy
Institute of Informatics, Silesian University of Technology, Akademicka 16, Gliwice, 44-100 Poland.
Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland and.
Bioinformatics. 2016 Apr 1;32(7):1115-7. doi: 10.1093/bioinformatics/btv704. Epub 2015 Nov 28.
Data compression is crucial in effective handling of genomic data. Among several recently published algorithms, ERGC seems to be surprisingly good, easily beating all of the competitors.
We evaluated ERGC and the previously proposed algorithms GDC and iDoComp, which are the ones used in the original paper for comparison, on a wide data set including 12 assemblies of human genome (instead of only four of them in the original paper). ERGC wins only when one of the genomes (referential or target) contains mixed-cased letters (which is the case for only the two Korean genomes). In all other cases ERGC is on average an order of magnitude worse than GDC and iDoComp.
sebastian.deorowicz@polsl.pl, iochoa@stanford.edu
Supplementary data are available at Bioinformatics online.
数据压缩对于有效处理基因组数据至关重要。在最近发表的几种算法中,ERGC似乎出奇地好,轻松击败了所有竞争对手。
我们在一个包含12个人类基因组组装(而不是原始论文中的仅4个)的广泛数据集上评估了ERGC以及先前提出的算法GDC和iDoComp,原始论文中使用这些算法进行比较。只有当其中一个基因组(参考基因组或目标基因组)包含大小写混合字母时(只有两个韩国基因组是这种情况),ERGC才会获胜。在所有其他情况下,ERGC平均比GDC和iDoComp差一个数量级。
sebastian.deorowicz@polsl.pl,iochoa@stanford.edu
补充数据可在《生物信息学》在线获取。