Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19714, USA.
Delaware Biotechnology Institute, University of Delaware, Newark, DE, 19714, USA.
New Phytol. 2018 Nov;220(3):851-864. doi: 10.1111/nph.15349. Epub 2018 Jul 18.
Little is known about the characteristics and function of reproductive phased, secondary, small interfering RNAs (phasiRNAs) in the Poaceae, despite the availability of significant genomic resources, experimental data, and a growing number of computational tools. We utilized machine-learning methods to identify sequence-based and positional features that distinguish phasiRNAs in rice and maize from other small RNAs (sRNAs). We developed Random Forest classifiers that can distinguish reproductive phasiRNAs from other sRNAs in complex sets of sequencing data, utilizing sequence-based (k-mers) and features describing position-specific sequence biases. The classification performance attained is > 80% in accuracy, sensitivity, specificity, and positive predicted value. Feature selection identified important features in both ends of phasiRNAs. We demonstrated that phasiRNAs have strand specificity and position-specific nucleotide biases potentially influencing AGO sorting; we also predicted targets to infer functions of phasiRNAs, and computationally assessed their sequence characteristics relative to other sRNAs. Our results demonstrate that machine-learning methods effectively identify phasiRNAs despite the lack of characteristic features typically present in precursor loci of other small RNAs, such as sequence conservation or structural motifs. The 5'-end features we identified provide insights into AGO-phasiRNA interactions. We describe a hypothetical model of competition for AGO loading between phasiRNAs of different nucleotide compositions.
尽管拥有大量的基因组资源、实验数据和越来越多的计算工具,但对于禾本科植物中生殖阶段、次要的小干扰 RNA(phasiRNAs)的特征和功能却知之甚少。我们利用机器学习方法来识别水稻和玉米中 phasiRNAs 与其他小 RNA(sRNAs)之间的基于序列和位置的特征。我们开发了随机森林分类器,可以区分复杂测序数据集中生殖阶段的 phasiRNAs 与其他 sRNAs,利用基于序列的(k-mers)和描述位置特异性序列偏差的特征。分类性能在准确性、敏感性、特异性和阳性预测值方面均超过 80%。特征选择确定了 phasiRNAs 两端的重要特征。我们证明了 phasiRNAs 具有链特异性和位置特异性核苷酸偏好性,可能影响 AGO 的分拣;我们还预测了靶标以推断 phasiRNAs 的功能,并计算评估了它们相对于其他 sRNAs 的序列特征。我们的结果表明,尽管缺乏其他小 RNA 前体基因座中常见的特征,例如序列保守性或结构基序,但机器学习方法可以有效地识别 phasiRNAs。我们确定的 5'-端特征提供了对 AGO-phasiRNA 相互作用的深入了解。我们描述了一个关于不同核苷酸组成的 phasiRNAs 之间竞争 AGO 加载的假设模型。