Yu Jingkai, Finley Russell L
Center for Molecular Medicine and Genetics and Department of Biochemistry and Molecular Biology, School of Medicine, Wayne State University, 540 East Canfield, Detroit, MI 48201, USA.
Bioinformatics. 2009 Jan 1;25(1):105-11. doi: 10.1093/bioinformatics/btn597. Epub 2008 Nov 14.
High-throughput experimental and computational methods are generating a wealth of protein-protein interaction data for a variety of organisms. However, data produced by current state-of-the-art methods include many false positives, which can hinder the analyses needed to derive biological insights. One way to address this problem is to assign confidence scores that reflect the reliability and biological significance of each interaction. Most previously described scoring methods use a set of likely true positives to train a model to score all interactions in a dataset. A single positive training set, however, may be biased and not representative of true interaction space.
We demonstrate a method to score protein interactions by utilizing multiple independent sets of training positives to reduce the potential bias inherent in using a single training set. We used a set of benchmark yeast protein interactions to show that our approach outperforms other scoring methods. Our approach can also score interactions across data types, which makes it more widely applicable than many previously proposed methods. We applied the method to protein interaction data from both Drosophila melanogaster and Homo sapiens. Independent evaluations show that the resulting confidence scores accurately reflect the biological significance of the interactions.
高通量实验方法和计算方法正在为多种生物体生成大量蛋白质-蛋白质相互作用数据。然而,当前最先进方法产生的数据包含许多假阳性结果,这可能会妨碍获取生物学见解所需的分析。解决这一问题的一种方法是分配置信度分数,以反映每种相互作用的可靠性和生物学意义。大多数先前描述的评分方法使用一组可能的真阳性结果来训练模型,以对数据集中的所有相互作用进行评分。然而,单个阳性训练集可能存在偏差,不能代表真实的相互作用空间。
我们展示了一种通过利用多个独立的阳性训练集来对蛋白质相互作用进行评分的方法,以减少使用单个训练集所固有的潜在偏差。我们使用一组基准酵母蛋白质相互作用数据表明,我们的方法优于其他评分方法。我们的方法还可以对不同数据类型的相互作用进行评分,这使其比许多先前提出的方法具有更广泛的适用性。我们将该方法应用于黑腹果蝇和智人的蛋白质相互作用数据。独立评估表明,所得的置信度分数准确反映了相互作用的生物学意义。