Department of Computer and Information Technology, Politehnica University of Timişoara, Timişoara 300223, Romania.
Department I-Drug Analysis, "Victor Babeş" University of Medicine and Pharmacy Timişoara, Timişoara 300041, Romania.
Gigascience. 2022 Dec 28;12(1). doi: 10.1093/gigascience/giad011. Epub 2023 Mar 9.
Widespread bioinformatics applications such as drug repositioning or drug-drug interaction prediction rely on the recent advances in machine learning, complex network science, and comprehensive drug datasets comprising the latest research results in molecular biology, biochemistry, or pharmacology. The problem is that there is much uncertainty in these drug datasets-we know the drug-drug or drug-target interactions reported in the research papers, but we cannot know if the not reported interactions are absent or yet to be discovered. This uncertainty hampers the accuracy of such bioinformatics applications.
We use complex network statistics tools and simulations of randomly inserted previously unaccounted interactions in drug-drug and drug-target interaction networks-built with data from DrugBank versions released over the plast decade-to investigate whether the abundance of new research data (included in the latest dataset versions) mitigates the uncertainty issue. Our results show that the drug-drug interaction networks built with the latest dataset versions become very dense and, therefore, almost impossible to analyze with conventional complex network methods. On the other hand, for the latest drug database versions, drug-target networks still include much uncertainty; however, the robustness of complex network analysis methods slightly improves.
Our big data analysis results pinpoint future research directions to improve the quality and practicality of drug databases for bioinformatics applications: benchmarking for drug-target interaction prediction and drug-drug interaction severity standardization.
药物重定位或药物-药物相互作用预测等广泛的生物信息学应用依赖于机器学习、复杂网络科学以及包含分子生物学、生物化学或药理学最新研究成果的综合药物数据集的最新进展。问题是这些药物数据集存在很大的不确定性——我们知道研究论文中报告的药物-药物或药物-靶点相互作用,但我们无法知道未报告的相互作用是否不存在或尚未发现。这种不确定性会降低这些生物信息学应用的准确性。
我们使用复杂网络统计工具和对药物-药物和药物-靶点相互作用网络中以前未考虑的随机插入相互作用的模拟——这些网络是使用过去十年中来自 DrugBank 版本的数据构建的——来研究新研究数据(包含在最新数据集版本中)的丰富程度是否可以减轻不确定性问题。我们的结果表明,使用最新数据集版本构建的药物-药物相互作用网络变得非常密集,因此几乎不可能用传统的复杂网络方法进行分析。另一方面,对于最新的药物数据库版本,药物-靶点网络仍然存在很大的不确定性;但是,复杂网络分析方法的稳健性略有提高。
我们的大数据分析结果指出了未来改善生物信息学应用药物数据库的质量和实用性的研究方向:药物靶点相互作用预测的基准测试和药物-药物相互作用严重程度标准化。