Shin Woosub, Gennari John H, Hellerstein Joseph L, Sauro Herbert M
Auckland Bioengineering Institute, University of Auckland, Auckland,1010,New Zealand.
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, 98195, WA, USA.
bioRxiv. 2023 Jul 21:2023.07.19.549722. doi: 10.1101/2023.07.19.549722.
Annotations of biochemical models provide details of chemical species, documentation of chemical reactions, and other essential information. Unfortunately, the vast majority of biochemical models have few, if any, annotations, or the annotations provide insufficient detail to understand the limitations of the model. The quality and quantity of annotations can be improved by developing tools that recommend annotations. For example, recommender tools have been developed for annotations of genes. Although annotating genes is conceptually similar to annotating biochemical models, there are important technical differences that make it difficult to directly apply this prior work.
We present AMAS, a system that predicts annotations for elements of models represented in the Systems Biology Markup Language (SBML) community standard. We provide a general framework for predicting model annotations for a query element based on a database of annotated reference elements and a match score function that calculates the similarity between the query element and reference elements. The framework is instantiated to specific element types (e.g., species, reactions) by specifying the reference database (e.g., ChEBI for species) and the match score function (e.g., string similarity). We analyze the computational efficiency and prediction quality of AMAS for species and reactions in BiGG and BioModels and find that it has sub-second response times and accuracy between 80% and 95% depending on specifics of what is predicted. We have incorporated AMAS into an open-source, pip-installable Python package that can run as a command-line tool that predicts and adds annotations to species and reactions to an SBML model.
Our project is hosted at https://github.com/sys-bio/AMAS, where we provide examples, documentation, and source code files. Our source code is licensed under the MIT open-source license.
生化模型的注释提供了化学物质的详细信息、化学反应的文档记录以及其他重要信息。不幸的是,绝大多数生化模型几乎没有注释,或者注释提供的细节不足以理解模型的局限性。通过开发推荐注释的工具,可以提高注释的质量和数量。例如,已经开发了用于基因注释的推荐工具。虽然注释基因在概念上与注释生化模型相似,但存在重要的技术差异,使得难以直接应用这项先前的工作。
我们提出了AMAS,这是一个为以系统生物学标记语言(SBML)社区标准表示的模型元素预测注释的系统。我们提供了一个通用框架,用于基于注释参考元素的数据库和计算查询元素与参考元素之间相似度的匹配分数函数,为查询元素预测模型注释。通过指定参考数据库(例如,用于物种的ChEBI)和匹配分数函数(例如,字符串相似度),将该框架实例化为特定的元素类型(例如,物种、反应)。我们分析了AMAS在BiGG和BioModels中对物种和反应的计算效率和预测质量,发现其响应时间在亚秒级,准确率在80%到95%之间,具体取决于预测内容的细节。我们已将AMAS纳入一个开源的、可通过pip安装的Python包中,该包可以作为命令行工具运行,为SBML模型中的物种和反应预测并添加注释。
我们的项目托管在https://github.com/sys-bio/AMAS,在那里我们提供了示例、文档和源代码文件。我们的源代码遵循MIT开源许可协议。