Biziukova Nadezhda Yu, Rudik Anastasia V, Dmitriev Alexander V, Tarasova Olga A, Filimonov Dmitry A, Poroikov Vladimir V
Institute of Biomedical Chemistry, 10-8, Pogodinskaya Str., Moscow 119121, Russian Federation.
ACS Omega. 2025 Jan 12;10(3):2459-2471. doi: 10.1021/acsomega.4c05723. eCollection 2025 Jan 28.
Understanding the biotransformation of xenobiotics in the human body is critical for a comprehensive assessment of drug effects since pharmacologically active drug metabolites may exhibit a range of biological effects that often differ from those of the original pharmaceutical agent. Studies of the biotransformation mechanisms of xenobiotics have resulted in numerous publications. Extracting information about the parent compounds (substrates) and their metabolites from the texts allows retrieval of information on their biological activities, molecular mechanisms of action, and toxicity. Manual curation of the names of xenobiotics, their metabolites, and biotransformation reactions in the text is a challenging task due to the large number of publications related to studies of pharmaceutical agents metabolism. Our aim is to create an annotated corpus of texts that can be used for automated extraction of the names of xenobiotics, including pharmaceutical agents that undergo biotransformation and their metabolites. Prior to manual annotation of the corpus, semiautomatic annotation was carried out based on the earlier developed rule-based method for parent compounds and their metabolites extraction. To create XenoMet, we automatically extracted relevant texts from PubMed using a query based on MeSH terms. The names of biotransformation reactions were recognized by using an in-house-developed dictionary. Then, we manually verified the extracted data by correcting errors in the named entity annotation and identified the associations between substrates and metabolites. We tested the applicability of XenoMet for the reconstruction of a metabolic tree and for the automated extraction of the chemical names of substrates, metabolites, and reactions of biotransformation. Classification of the named entities of metabolites, substrates, and biotransformation reactions by a conditional random fields approach using XenoMet as the training set provides an F-score of 0.79.
了解人体中外源化学物的生物转化对于全面评估药物效应至关重要,因为具有药理活性的药物代谢产物可能表现出一系列往往与原始药剂不同的生物学效应。关于外源化学物生物转化机制的研究已产生了大量出版物。从文本中提取有关母体化合物(底物)及其代谢产物的信息,能够获取有关它们的生物活性、分子作用机制和毒性的信息。由于与药剂代谢研究相关的出版物数量众多,手动整理文本中外源化学物及其代谢产物的名称以及生物转化反应是一项具有挑战性的任务。我们的目标是创建一个带注释的文本语料库,可用于自动提取外源化学物的名称,包括经历生物转化的药剂及其代谢产物。在对语料库进行手动注释之前,基于早期开发的用于提取母体化合物及其代谢产物的基于规则方法进行了半自动注释。为了创建XenoMet,我们使用基于医学主题词(MeSH)术语的查询从PubMed中自动提取相关文本。生物转化反应的名称通过使用内部开发的词典来识别。然后,我们通过纠正命名实体注释中的错误来手动验证提取的数据,并确定底物与代谢产物之间的关联。我们测试了XenoMet在重建代谢树以及自动提取底物、代谢产物的化学名称和生物转化反应方面的适用性。使用XenoMet作为训练集,通过条件随机场方法对代谢产物、底物和生物转化反应的命名实体进行分类,F值为0.79。