Tetko Igor V, Engkvist Ola, Koch Uwe, Reymond Jean-Louis, Chen Hongming
Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany.
BIGCHEM GmbH, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany.
Mol Inform. 2016 Dec;35(11-12):615-621. doi: 10.1002/minf.201600073. Epub 2016 Jul 28.
The increasing volume of biomedical data in chemistry and life sciences requires the development of new methods and approaches for their handling. Here, we briefly discuss some challenges and opportunities of this fast growing area of research with a focus on those to be addressed within the BIGCHEM project. The article starts with a brief description of some available resources for "Big Data" in chemistry and a discussion of the importance of data quality. We then discuss challenges with visualization of millions of compounds by combining chemical and biological data, the expectations from mining the "Big Data" using advanced machine-learning methods, and their applications in polypharmacology prediction and target de-convolution in phenotypic screening. We show that the efficient exploration of billions of molecules requires the development of smart strategies. We also address the issue of secure information sharing without disclosing chemical structures, which is critical to enable bi-party or multi-party data sharing. Data sharing is important in the context of the recent trend of "open innovation" in pharmaceutical industry, which has led to not only more information sharing among academics and pharma industries but also the so-called "precompetitive" collaboration between pharma companies. At the end we highlight the importance of education in "Big Data" for further progress of this area.
化学和生命科学领域生物医学数据量的不断增加,需要开发新的方法和途径来处理这些数据。在此,我们简要讨论这一快速发展的研究领域的一些挑战和机遇,重点关注BIGCHEM项目中需要解决的问题。本文首先简要介绍了化学领域中一些可用的“大数据”资源,并讨论了数据质量的重要性。然后,我们讨论了通过结合化学和生物学数据对数百万种化合物进行可视化的挑战、使用先进机器学习方法挖掘“大数据”的期望,以及它们在多药理学预测和表型筛选中的靶点反卷积中的应用。我们表明,对数以十亿计的分子进行有效探索需要开发智能策略。我们还讨论了在不披露化学结构的情况下进行安全信息共享的问题,这对于实现双方或多方数据共享至关重要。在制药行业“开放创新”的最新趋势背景下,数据共享很重要,这不仅导致学术界和制药行业之间更多的信息共享,还促成了制药公司之间所谓的“竞争前”合作。最后,我们强调了“大数据”教育对该领域进一步发展的重要性。