Computer Science, North Carolina State University, Raleigh, North Carolina 27606, United States.
Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
J Chem Inf Model. 2021 Jun 28;61(6):2641-2647. doi: 10.1021/acs.jcim.1c00166. Epub 2021 May 25.
The growing quantity of public and private data sets focused on small molecules screened against biological targets or whole organisms provides a wealth of drug discovery relevant data. This is matched by the availability of machine learning algorithms such as Support Vector Machines (SVM) and Deep Neural Networks (DNN) that are computationally expensive to perform on very large data sets with thousands of molecular descriptors. Quantum computer (QC) algorithms have been proposed to offer an approach to accelerate quantum machine learning over classical computer (CC) algorithms, however with significant limitations. In the case of cheminformatics, which is widely used in drug discovery, one of the challenges to overcome is the need for compression of large numbers of molecular descriptors for use on a QC. Here, we show how to achieve compression with data sets using hundreds of molecules (SARS-CoV-2) to hundreds of thousands of molecules (whole cell screening data sets for plague and ) with SVM and the data reuploading classifier (a DNN equivalent algorithm) on a QC benchmarked against CC and hybrid approaches. This study illustrates the steps needed in order to be "quantum computer ready" in order to apply quantum computing to drug discovery and to provide the foundation on which to build this field.
越来越多的公共和私人数据集集中在针对生物靶标或整个生物体筛选的小分子上,提供了丰富的与药物发现相关的数据。这与机器学习算法(如支持向量机(SVM)和深度神经网络(DNN)的可用性相匹配,这些算法在具有数千个分子描述符的非常大数据集上执行时计算成本非常高。已经提出了量子计算机 (QC) 算法来提供一种方法来加速量子机器学习相对于经典计算机 (CC) 算法,然而存在重大限制。在化学信息学中,它被广泛应用于药物发现,需要克服的挑战之一是需要压缩大量的分子描述符,以便在 QC 上使用。在这里,我们展示了如何使用数百个分子(SARS-CoV-2)到数十万分子(用于鼠疫和的全细胞筛选数据集)的数据集来实现压缩,使用 QC 上的 SVM 和数据重新上传分类器(DNN 等效算法)进行基准测试,与 CC 和混合方法进行比较。这项研究说明了为了将量子计算应用于药物发现并为该领域的发展奠定基础,在“量子计算机就绪”方面所需的步骤。