Jung Inuk, Park Jong Chan, Kim Sun
Interdisciplinary Program in Bioinformatics, Republic of Korea; Bioinformatics Institute, Republic of Korea.
Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.
Comput Biol Chem. 2014 Jun;50:60-7. doi: 10.1016/j.compbiolchem.2014.01.008. Epub 2014 Jan 23.
Piwi-interacting RNAs (piRNAs) are recently discovered, endogenous small non-coding RNAs. piRNAs protect the genome from invasive transposable elements (TE) and sustain integrity of the genome in germ cell lineages. Small RNA-sequencing data can be used to detect piRNA activations in a cell under a specific condition. However, identification of cell specific piRNA activations requires sophisticated computational methods. As of now, there is only one computational method, proTRAC, to locate activated piRNAs from the sequencing data. proTRAC detects piRNA clusters based on a probabilistic analysis with assumption of a uniform distribution. Unfortunately, we were not able to locate activated piRNAs from our proprietary sequencing data in chicken germ cells using proTRAC. With a careful investigation on data sets, we found that a uniform or any statistical distribution for detecting piRNA clusters may not be assumed. Furthermore, small RNA-seq data contains many different types of RNAs which was not carefully taken into account in previous studies. To improve piRNA cluster identification, we developed piClust that uses a density based clustering approach without assumption of any parametric distribution. In previous studies, it is known that piRNAs exhibit a strong tendency of forming piRNA clusters in syntenic regions of the genome. Thus, the density based clustering approach is effective and robust to the existence of non-piRNAs or noise in the data. In experiments with piRNA data from human, mouse, rat and chicken, piClust was able to detect piRNA clusters from total small RNA-seq data from germ cell lines, while proTRAC was not successful. piClust outperformed proTRAC in terms of sensitivity and running time (up to 200 folds). piClust is currently available as a web service at http://epigenomics.snu.ac.kr/piclustweb.
Piwi相互作用RNA(piRNA)是最近发现的内源性小非编码RNA。piRNA保护基因组免受侵入性转座元件(TE)的影响,并在生殖细胞谱系中维持基因组的完整性。小RNA测序数据可用于检测特定条件下细胞中的piRNA激活情况。然而,识别细胞特异性的piRNA激活需要复杂的计算方法。截至目前,只有一种计算方法proTRAC可从测序数据中定位激活的piRNA。proTRAC基于概率分析并假设均匀分布来检测piRNA簇。不幸的是,我们无法使用proTRAC从我们在鸡生殖细胞中的专有测序数据中定位激活的piRNA。通过对数据集的仔细研究,我们发现可能无法假设用于检测piRNA簇的均匀或任何统计分布。此外,小RNA测序数据包含许多不同类型的RNA,而先前的研究并未仔细考虑这一点。为了改进piRNA簇的识别,我们开发了piClust,它使用基于密度的聚类方法,无需假设任何参数分布。在先前的研究中,已知piRNA在基因组的同线区域表现出形成piRNA簇的强烈趋势。因此,基于密度的聚类方法对于数据中存在的非piRNA或噪声是有效且稳健的。在对来自人类、小鼠、大鼠和鸡的piRNA数据进行的实验中,piClust能够从生殖细胞系的总小RNA测序数据中检测到piRNA簇,而proTRAC则未成功。piClust在灵敏度和运行时间方面(高达200倍)优于proTRAC。piClust目前可作为网络服务在http://epigenomics.snu.ac.kr/piclustweb上获取。