Spouge John L, Ziegelbauer Joseph M, Gonzalez Mileidy
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Room 6N603, Building 38A, Bethesda, MD 20894 USA.
HIV and AIDS Malignancy Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 USA.
Algorithms Mol Biol. 2020 Sep 19;15:17. doi: 10.1186/s13015-020-00178-x. eCollection 2020.
Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the pvalue leads to a specific algorithmic problem. Given elements in a set with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products ( ).
This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like ; its novel downward phase mirrors the upward phase while exploiting the symmetry of and its complement . The algorithm requires storage for elements of and only about products. In contrast, the standard segment tree algorithms require about products for construction and products for calculating each , i.e., about products in total; and a naïve quadratic algorithm using element-by-element products to compute each requires products.
In the herpesvirus application, the Jackknife Product algorithm required 15 min; standard segment tree algorithms would have taken an estimated 3 h; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.
关于人类环状RNA上疱疹病毒微小RNA基序的数据提出了以下统计学问题。考虑独立的随机计数,不一定是同分布的。在总和的条件下,判断其中一个计数是否异常大。p值的精确计算导致了一个特定的算法问题。给定一个具有封闭性和结合性且乘法可交换但无逆元的集合中的元素,计算留一法(Jackknife)乘积( )。
本文给出了一种线性时间的留一法乘积算法。其向上阶段构建一个标准段树来计算诸如 的段乘积;其新颖的向下阶段在利用 及其补集 的对称性的同时镜像向上阶段。该算法需要存储 中的 个元素和仅约 个乘积。相比之下,标准段树算法构建时需要约 个乘积,计算每个 时需要 个乘积,即总共约 个乘积;而使用逐个元素乘积来计算每个 的朴素二次算法需要 个乘积。
在疱疹病毒应用中,留一法乘积算法耗时15分钟;标准段树算法估计需要3小时;二次算法估计需要1个月。留一法乘积算法在生物信息学和统计学中有许多可能的用途。