Bosc Nicolas, Felix Eloy, Arcila Ricardo, Mendez David, Saunders Martin R, Green Darren V S, Ochoada Jason, Shelat Anang A, Martin Eric J, Iyer Preeti, Engkvist Ola, Verras Andreas, Duffy James, Burrows Jeremy, Gardner J Mark F, Leach Andrew R
European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD, Hinxton, Cambridge, United Kingdom.
Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Hertfordshire, SG1 2NY, Stevenage, UK.
J Cheminform. 2021 Feb 22;13(1):13. doi: 10.1186/s13321-021-00487-2.
Malaria is a disease affecting hundreds of millions of people across the world, mainly in developing countries and especially in sub-Saharan Africa. It is the cause of hundreds of thousands of deaths each year and there is an ever-present need to identify and develop effective new therapies to tackle the disease and overcome increasing drug resistance. Here, we extend a previous study in which a number of partners collaborated to develop a consensus in silico model that can be used to identify novel molecules that may have antimalarial properties. The performance of machine learning methods generally improves with the number of data points available for training. One practical challenge in building large training sets is that the data are often proprietary and cannot be straightforwardly integrated. Here, this was addressed by sharing QSAR models, each built on a private data set. We describe the development of an open-source software platform for creating such models, a comprehensive evaluation of methods to create a single consensus model and a web platform called MAIP available at https://www.ebi.ac.uk/chembl/maip/ . MAIP is freely available for the wider community to make large-scale predictions of potential malaria inhibiting compounds. This project also highlights some of the practical challenges in reproducing published computational methods and the opportunities that open-source software can offer to the community.
疟疾是一种影响着全球数亿人的疾病,主要集中在发展中国家,尤其是撒哈拉以南非洲地区。它每年导致数十万人死亡,因此一直需要识别和开发有效的新疗法来应对该疾病并克服日益增加的耐药性。在此,我们扩展了之前的一项研究,在该研究中,多个合作伙伴共同努力开发了一种基于计算机模拟的共识模型,该模型可用于识别可能具有抗疟特性的新型分子。机器学习方法的性能通常会随着可用于训练的数据点数量的增加而提高。构建大型训练集的一个实际挑战是数据通常是专有的,无法直接整合。在此,通过共享每个基于私有数据集构建的定量构效关系(QSAR)模型解决了这个问题。我们描述了一个用于创建此类模型的开源软件平台的开发、对创建单个共识模型的方法的全面评估以及一个名为MAIP的网络平台(可在https://www.ebi.ac.uk/chembl/maip/上获取)。MAIP可供更广泛的社区免费使用,以对潜在的疟疾抑制化合物进行大规模预测。该项目还突出了重现已发表的计算方法时的一些实际挑战以及开源软件可为社区提供的机会。