Clark Alex M, Gedeck Peter, Cheung Philip P, Bunin Barry A
Collaborative Drug Discovery, Inc. 1633 Bayshore Hwy, Suite 342, Burlingame, California 94010, United States.
ACS Omega. 2021 Aug 18;6(34):22400-22409. doi: 10.1021/acsomega.1c03311. eCollection 2021 Aug 31.
Chemical mixtures have recently come to the attention of open standards and data structures for capturing machine-readable descriptions for informatics uses. At the present time, essentially all transmission of information about mixtures is done using short text descriptions that are readable only by trained scientists, and there are no accessible repositories of marked-up mixture data. We have designed a machine learning tool that can interpret mixture descriptions and upgrade them to the high-level format, which can in turn be used to generate notation. The interpretation achieves a high success rate and can be used at scale to markup large catalogs and inventories, with some expert checking to catch edge cases. The training data that was accumulated during the project is made openly available, along with previously released mixture editing tools and utilities.
化学混合物最近已引起开放标准和数据结构的关注,这些标准和数据结构用于捕获机器可读的描述,以用于信息学用途。目前,基本上所有关于混合物的信息传输都是使用只有经过培训的科学家才能读懂的简短文本描述来完成的,而且没有可供访问的标记化混合物数据存储库。我们设计了一种机器学习工具,它可以解释混合物描述并将其升级为高级格式,进而可用于生成符号。这种解释具有很高的成功率,并且可以大规模用于标记大型目录和清单,同时进行一些专家检查以捕捉边缘情况。项目期间积累的训练数据与之前发布的混合物编辑工具和实用程序一起公开提供。