Reguly Teresa, Breitkreutz Ashton, Boucher Lorrie, Breitkreutz Bobby-Joe, Hon Gary C, Myers Chad L, Parsons Ainslie, Friesen Helena, Oughtred Rose, Tong Amy, Stark Chris, Ho Yuen, Botstein David, Andrews Brenda, Boone Charles, Troyanskya Olga G, Ideker Trey, Dolinski Kara, Batada Nizar N, Tyers Mike
Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto ON M5G 1X5, Canada.
Department of Medical Genetics and Microbiology, University of Toronto, Toronto ON M5S 1A8, Canada.
J Biol. 2006;5(4):11. doi: 10.1186/jbiol36. Epub 2006 Jun 8.
The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference.
We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases.
Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.
高通量(HTP)方法用于检测基因和蛋白质相互作用,推动了复杂生物网络的研究及基因功能预测。然而,HTP数据集中的稀疏覆盖可能会扭曲网络特性并混淆预测结果。尽管科学文献中记录了大量经过充分证实的相互作用,但这些数据尚未被提炼成能够进行系统水平推断的网络。
我们在此描述了一个关于芽殖酵母酿酒酵母的遗传和蛋白质相互作用以及相关实验证据的综合数据库,该数据库是从超过31793篇摘要和在线出版物中手动整理而来。这个文献整理(LC)数据集包含33311个相互作用,数量与所有现存HTP数据集的总和相当。令人惊讶的是,目前HTP蛋白质相互作用数据集仅覆盖了文献中约14%的相互作用。不过,LC网络与HTP网络具有共同属性,包括无标度连通性以及相互作用、丰度、定位和表达之间的相关性。我们发现必需基因或蛋白质与其他必需基因或蛋白质的相互作用更为丰富,这表明全局网络可能在功能上是统一的。LC数据集中蛋白质和遗传相互作用的大量重叠支持了这种相互连接性。我们表明,LC数据集显著提高了网络分析方法的预测能力。完整的LC数据集可在BioGRID(http://www.thebiogrid.org)和SGD(http://www.yeastgenome.org/)数据库中获取。
从原始文献中获得的生物相互作用综合数据集为HTP方法提供了关键基准,增强了功能预测,并揭示了生物网络的系统水平属性。