Schimek Michael G, Budinská Eva, Kugler Karl G, Švendová Vendula, Ding Jie, Lin Shili
Stat Appl Genet Mol Biol. 2015 Jun;14(3):311-6. doi: 10.1515/sagmb-2014-0093.
High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite different technologies, many experiments share the same goal. For instance, the aims of RNA-seq studies often coincide with those of differential gene expression experiments based on microarrays. As such, it would be logical to utilize all available data. However, there is a lack of biostatistical tools for the integration of results obtained from different technologies. Although diverse technological platforms produce different raw data, one commonality for experiments with the same goal is that all the outcomes can be transformed into a platform-independent data format - rankings - for the same set of items. Here we present the R package TopKLists, which allows for statistical inference on the lengths of informative (top-k) partial lists, for stochastic aggregation of full or partial lists, and for graphical exploration of the input and consolidated output. A graphical user interface has also been implemented for providing access to the underlying algorithms. To illustrate the applicability and usefulness of the package, we integrated microRNA data of non-small cell lung cancer across different measurement techniques and draw conclusions. The package can be obtained from CRAN under a LGPL-3 license.
高通量测序技术越来越经济实惠,且能产生大量数据。与其他高通量技术(如微阵列)一起,数据库中有大量资源。收集这些有价值的数据已经有十多年的常规操作了。尽管技术不同,但许多实验有着相同的目标。例如,RNA测序研究的目的常常与基于微阵列的差异基因表达实验的目的一致。因此,利用所有可用数据是合乎逻辑的。然而,缺乏用于整合从不同技术获得的结果的生物统计学工具。尽管不同的技术平台产生不同的原始数据,但对于具有相同目标的实验来说,一个共同点是所有结果都可以转换为一种与平台无关的数据格式——排名——针对同一组项目。在这里,我们展示了R包TopKLists,它允许对信息丰富的(前k个)部分列表的长度进行统计推断,对完整或部分列表进行随机汇总,并对输入和合并输出进行图形化探索。还实现了一个图形用户界面以提供对底层算法的访问。为了说明该包的适用性和实用性,我们整合了不同测量技术下非小细胞肺癌的 microRNA 数据并得出结论。该包可在LGPL - 3许可下从CRAN获取。