Institute of Anthropology, Johannes Gutenberg-University Mainz, Colonel-Kleinmann-Weg 2, 55099 Mainz, Germany.
BMC Bioinformatics. 2012 Jan 10;13:5. doi: 10.1186/1471-2105-13-5.
Throughout the metazoan lineage, typically gonadal expressed Piwi proteins and their guiding piRNAs (~26-32nt in length) form a protective mechanism of RNA interference directed against the propagation of transposable elements (TEs). Most piRNAs are generated from genomic piRNA clusters. Annotation of experimentally obtained piRNAs from small RNA/cDNA-libraries and detection of genomic piRNA clusters are crucial for a thorough understanding of the still enigmatic piRNA pathway, especially in an evolutionary context. Currently, detection of piRNA clusters relies on bioinformatics rather than detection and sequencing of primary piRNA cluster transcripts and the stringency of the methods applied in different studies differs considerably. Additionally, not all important piRNA cluster characteristics were taken into account during bioinformatic processing. Depending on the applied method this can lead to: i) an accidentally underrepresentation of TE related piRNAs, ii) overlook duplicated clusters harboring few or no single-copy loci and iii) false positive annotation of clusters that are in fact just accumulations of multi-copy loci corresponding to frequently mapped reads, but are not transcribed to piRNA precursors.
We developed a software which detects and analyses piRNA clusters (proTRAC, probabilistic TRacking and Analysis of Clusters) based on quantifiable deviations from a hypothetical uniform distribution regarding the decisive piRNA cluster characteristics. We used piRNA sequences from human, macaque, mouse and rat to identify piRNA clusters in the respective species with proTRAC and compared the obtained results with piRNA cluster annotation from piRNABank and the results generated by different hitherto applied methods.proTRAC identified clusters not annotated at piRNABank and rejected annotated clusters based on the absence of important features like strand asymmetry. We further show, that proTRAC detects clusters that are passed over if a minimum number of single-copy piRNA loci are required and that proTRAC assigns more sequence reads per cluster since it does not preclude frequently mapped reads from the analysis.
With proTRAC we provide a reliable tool for detection, visualization and analysis of piRNA clusters. Detected clusters are well supported by comprehensible probabilistic parameters and retain a maximum amount of information, thus overcoming the present conflict of sensitivity and specificity in piRNA cluster detection.
在整个后生动物谱系中,通常性腺表达的 Piwi 蛋白及其指导的 piRNA(约 26-32nt 长)形成了一种针对转座元件(TEs)传播的 RNA 干扰保护机制。大多数 piRNA 是从基因组 piRNA 簇中产生的。从小 RNA/cDNA 文库中实验获得的 piRNA 的注释和基因组 piRNA 簇的检测对于深入了解仍然神秘的 piRNA 途径至关重要,特别是在进化背景下。目前,piRNA 簇的检测依赖于生物信息学,而不是初级 piRNA 簇转录本的检测和测序,不同研究中应用的方法的严格程度有很大差异。此外,在生物信息处理过程中,并非所有重要的 piRNA 簇特征都被考虑在内。根据应用的方法,这可能导致:i)与 TE 相关的 piRNA 意外表示不足,ii)忽略了包含少数或没有单拷贝基因座的重复簇,iii)错误地注释实际上只是频繁映射读取对应多拷贝基因座累积的簇,但未转录为 piRNA 前体。
我们开发了一种基于可量化的与假设的均匀分布关于决定性 piRNA 簇特征的偏差来检测和分析 piRNA 簇的软件(proTRAC,概率跟踪和聚类分析)。我们使用来自人类、猕猴、小鼠和大鼠的 piRNA 序列,用 proTRAC 在各自的物种中识别 piRNA 簇,并将获得的结果与 piRNABank 的 piRNA 簇注释和迄今为止应用的不同方法的结果进行比较。proTRAC 识别了 piRNABank 未注释的簇,并基于缺乏重要特征(如链不对称性)拒绝了注释的簇。我们进一步表明,如果需要单个拷贝 piRNA 基因座的最小数量,则 proTRAC 会检测到被忽略的簇,并且 proTRAC 会为每个簇分配更多的序列读取,因为它不会排除来自分析的频繁映射读取。
我们提供了一种可靠的工具 proTRAC,用于 piRNA 簇的检测、可视化和分析。检测到的簇由可理解的概率参数很好地支持,并保留了最大数量的信息,从而克服了 piRNA 簇检测中目前存在的敏感性和特异性冲突。