Suppr超能文献

基于主动学习的人类蛋白质-蛋白质相互作用预测。

Active learning for human protein-protein interaction prediction.

机构信息

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA. mop13+

出版信息

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S57. doi: 10.1186/1471-2105-11-S1-S57.

Abstract

BACKGROUND

Biological processes in cells are carried out by means of protein-protein interactions. Determining whether a pair of proteins interacts by wet-lab experiments is resource-intensive; only about 38,000 interactions, out of a few hundred thousand expected interactions, are known today. Active machine learning can guide the selection of pairs of proteins for future experimental characterization in order to accelerate accurate prediction of the human protein interactome.

RESULTS

Random forest (RF) has previously been shown to be effective for predicting protein-protein interactions. Here, four different active learning algorithms have been devised for selection of protein pairs to be used to train the RF. With labels of as few as 500 protein-pairs selected using any of the four active learning methods described here, the classifier achieved a higher F-score (harmonic mean of Precision and Recall) than with 3000 randomly chosen protein-pairs. F-score of predicted interactions is shown to increase by about 15% with active learning in comparison to that with random selection of data.

CONCLUSION

Active learning algorithms enable learning more accurate classifiers with much lesser labelled data and prove to be useful in applications where manual annotation of data is formidable. Active learning techniques demonstrated here can also be applied to other proteomics applications such as protein structure prediction and classification.

摘要

背景

细胞中的生物过程是通过蛋白质-蛋白质相互作用来实现的。通过湿实验室实验确定一对蛋白质是否相互作用是资源密集型的;今天已知的相互作用只有几十万预期相互作用中的约 38000 个。主动机器学习可以指导选择未来用于实验表征的蛋白质对,以加速准确预测人类蛋白质相互作用组。

结果

随机森林(RF)先前已被证明可有效预测蛋白质-蛋白质相互作用。在这里,设计了四种不同的主动学习算法来选择要用于训练 RF 的蛋白质对。使用这里描述的四种主动学习方法中的任何一种选择的标签数量仅为 500 对,分类器的 F 分数(精度和召回率的调和平均值)高于随机选择的 3000 对。与随机选择数据相比,主动学习将预测相互作用的 F 分数提高了约 15%。

结论

主动学习算法可以使用更少的标记数据学习更准确的分类器,并在数据手动注释困难的应用中证明是有用的。这里展示的主动学习技术还可以应用于其他蛋白质组学应用,如蛋白质结构预测和分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2684/3009530/983ec7fb7051/1471-2105-11-S1-S57-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验