Bajpai Akhilesh Kumar, Davuluri Sravanthi, Tiwary Kriti, Narayanan Sithalechumi, Oguru Sailaja, Basavaraju Kavyashree, Dayalan Deena, Thirumurugan Kavitha, Acharya Kshitish K
Structural Biology Lab, Centre for Biomedical Research, School of Bio Sciences & Technology (SBST), Vellore Institute of Technology (VIT) University, Vellore 632014, Tamil Nadu, India; Shodhaka Life Sciences Pvt. Ltd., Electronic City, Phase I, Bengaluru (Bangalore) 560100, Karnataka, India.
Shodhaka Life Sciences Pvt. Ltd., Electronic City, Phase I, Bengaluru (Bangalore) 560100, Karnataka, India.
J Biomed Inform. 2020 Mar;103:103380. doi: 10.1016/j.jbi.2020.103380. Epub 2020 Jan 28.
In absence of periodic systematic comparisons, biologists/bioinformaticians may be forced to make a subjective selection among the many protein-protein interaction (PPI) databases and tools. We conducted a comprehensive compilation and comparison of such resources. We compiled 375 PPI resources, short-listed 125 important ones (both lists are available at startbioinfo.com), and compared the features and coverage of 16 carefully-selected databases related to human PPIs. We quantitatively compared the coverage of 'experimentally verified' as well as 'total' (experimentally verified and predicted) PPIs for these 16 databases. Coverage was compared in two ways: (a) PPIs obtained in response to gene queries using the web interfaces were compared. As a query set, 108 genes expressed differently across tissues (specific to kidney, testis, and uterus, and ubiquitous - i.e., expressed in 43 human normal tissues) or associated with certain diseases (breast cancer, lung cancer, Alzheimer's, cystic fibrosis, diabetes, and cardiomyopathy) were chosen. The coverage was also compared for the well-studied genes versus the less-studied ones. The coverage of the databases for high-quality interactions was separately assessed using a set of literature curated experimentally-proven PPIs (gold standard PPI-set); (b) the back-end-data from 15 PPI databases was downloaded and compared. Combined results from STRING and UniHI covered around 84% of 'experimentally verified' PPIs. Approximately 94% of the 'total' PPIs available across the databases were retrieved by the combined use of hPRINT, STRING, and IID. Among the experimentally verified PPIs found exclusively in each database, STRING contributed around 71% of the hits. The coverage of certain databases was skewed for some gene-types. Analysis with the gold-standard PPI-set revealed that GPS-Prot, STRING, APID, and HIPPIE, each covered ~70% of the curated interactions. The database usage frequencies did not always correlate with their respective advantages, thereby justifying the need for more frequent studies of this nature.
在缺乏定期系统比较的情况下,生物学家/生物信息学家可能被迫在众多蛋白质-蛋白质相互作用(PPI)数据库和工具中进行主观选择。我们对这些资源进行了全面的汇编和比较。我们汇编了375个PPI资源,筛选出125个重要资源(这两个列表均可在startbioinfo.com上获取),并比较了16个精心挑选的与人类PPI相关的数据库的特征和覆盖范围。我们定量比较了这16个数据库中“实验验证”以及“总计”(实验验证和预测)PPI的覆盖范围。覆盖范围通过两种方式进行比较:(a)比较使用网络界面响应基因查询获得的PPI。作为查询集,选择了108个在不同组织中表达不同(特定于肾脏、睾丸和子宫,以及普遍存在——即在43种人类正常组织中表达)或与某些疾病(乳腺癌、肺癌、阿尔茨海默病、囊性纤维化、糖尿病和心肌病)相关的基因。还比较了研究充分的基因与研究较少的基因的覆盖范围。使用一组经过文献整理的实验验证的PPI(金标准PPI集)分别评估数据库中高质量相互作用的覆盖范围;(b)下载并比较了15个PPI数据库的后端数据。STRING和UniHI的综合结果涵盖了约84%的“实验验证”PPI。通过联合使用hPRINT、STRING和IID检索到了数据库中约94%的“总计”PPI。在每个数据库中单独发现的实验验证PPI中,STRING贡献了约71%的命中数。某些数据库的覆盖范围在某些基因类型上存在偏差。使用金标准PPI集进行分析表明,GPS-Prot、STRING、APID和HIPPIE各自覆盖了约7个精心挑选的与人类PPI相关的数据库的特征和覆盖范围。我们定量比较了这16个数据库中“实验验证”以及“总计”(实验验证和预测)PPI的覆盖范围。覆盖范围通过两种方式进行比较:(a)比较使用网络界面响应基因查询获得的PPI。作为查询集,选择了108个在不同组织中表达不同(特定于肾脏、睾丸和子宫,以及普遍存在——即在43种人类正常组织中表达)或与某些疾病(乳腺癌、肺癌、阿尔茨海默病、囊性纤维化、糖尿病和心肌病)相关的基因。还比较了研究充分的基因与研究较少的基因的覆盖范围。使用一组经过文献整理的实验验证的PPI(金标准PPI集)分别评估数据库中高质量相互作用的覆盖范围;(b)下载并比较了15个PPI数据库的后端数据。STRING和UniHI的综合结果涵盖了约总相互作用的70%。数据库的使用频率并不总是与其各自的优势相关,因此有必要更频繁地进行此类研究。 (注:原文中“Analysis with the gold-standard PPI-set revealed that GPS-Prot, STRING, APID, and HIPPIE, each covered ~70% of the curated interactions.”此句中“约总相互作用的70%”表述似乎有误,推测可能是想表达“约70%的经整理的相互作用”,翻译按推测内容进行了调整。)