Brunak S, Engelbrecht J, Knudsen S
Department of Structural Properties of Materials, Technical University of Denmark, Lyngby.
J Mol Biol. 1991 Jul 5;220(1):49-65. doi: 10.1016/0022-2836(91)90380-o.
Artificial neural networks have been applied to the prediction of splice site location in human pre-mRNA. A joint prediction scheme where prediction of transition regions between introns and exons regulates a cutoff level for splice site assignment was able to predict splice site locations with confidence levels far better than previously reported in the literature. The problem of predicting donor and acceptor sites in human genes is hampered by the presence of numerous amounts of false positives: here, the distribution of these false splice sites is examined and linked to a possible scenario for the splicing mechanism in vivo. When the presented method detects 95% of the true donor and acceptor sites, it makes less than 0.1% false donor site assignments and less than 0.4% false acceptor site assignments. For the large data set used in this study, this means that on average there are one and a half false donor sites per true donor site and six false acceptor sites per true acceptor site. With the joint assignment method, more than a fifth of the true donor sites and around one fourth of the true acceptor sites could be detected without accompaniment of any false positive predictions. Highly confident splice sites could not be isolated with a widely used weight matrix method or by separate splice site networks. A complementary relation between the confidence levels of the coding/non-coding and the separate splice site networks was observed, with many weak splice sites having sharp transitions in the coding/non-coding signal and many stronger splice sites having more ill-defined transitions between coding and non-coding.
人工神经网络已应用于预测人类前体信使核糖核酸(pre-mRNA)中的剪接位点位置。一种联合预测方案,即内含子和外显子之间过渡区域的预测调节剪接位点分配的截止水平,能够以比文献中先前报道的置信水平更好地预测剪接位点位置。人类基因中供体和受体位点预测问题受到大量假阳性的阻碍:在此,研究了这些假剪接位点的分布,并将其与体内剪接机制的一种可能情况联系起来。当所提出的方法检测到95%的真实供体和受体位点时,其假供体位点分配少于0.1%,假受体位点分配少于0.4%。对于本研究中使用的大数据集,这意味着平均每个真实供体位点有1.5个假供体位点,每个真实受体位点有6个假受体位点。使用联合分配方法,可以检测到超过五分之一的真实供体位点和约四分之一的真实受体位点,而不会有任何假阳性预测。使用广泛使用的权重矩阵方法或单独的剪接位点网络无法分离出高度置信的剪接位点。观察到编码/非编码和单独的剪接位点网络的置信水平之间存在互补关系,许多弱剪接位点在编码/非编码信号中有急剧转变,而许多较强的剪接位点在编码和非编码之间的转变更不明确。