Graduate School of Engineering; Glycan and Life Systems Integration Center, Soka University, Hachioji, Tokyo, 192-8577, Japan.
BMC Microbiol. 2021 Nov 22;21(1):325. doi: 10.1186/s12866-021-02384-y.
The abundance of glycomics data that have accumulated has led to the development of many useful databases to aid in the understanding of the function of the glycans and their impact on cellular activity. At the same time, the endeavor for data sharing between glycomics databases with other biological databases have contributed to the creation of new knowledgebases. However, different data types in data description have impeded the data sharing for knowledge integration. To solve this matter, Semantic Web techniques including Resource Description Framework (RDF) and ontology development have been adopted by various groups to standardize the format for data exchange. These semantic data have contributed to the expansion of knowledgebases and hold promises of providing data that can be intelligently processed. On the other hand, bench biologists who are experts in experimental finding are end users and data producers. Therefore, it is indispensable to reduce the technical barrier required for bench biologists to manipulate their experimental data to be compatible with standard formats for data sharing.
There are many essential concepts and practical techniques for data integration but there is no method to enable researchers to easily apply Semantic Web techniques to their experimental data. We implemented our procedure on unformatted information of E.coli O-antigen structures collected from the web and show how this information can be expressed as formatted data applicable to Semantic Web standards. In particular, we described the E-coli O-antigen biosynthesis pathway using the BioPAX ontology developed to support data exchange between pathway databases.
The method we implemented to semantically describe O-antigen biosynthesis should be helpful for biologists to understand how glycan information, including relevant pathway reaction data, can be easily shared. We hope this method can contribute to lower the technical barrier that is required when experimental findings are formulated into formal representations and can lead bench scientists to readily participate in the construction of new knowledgebases that are integrated with existing ones. Such integration over the Semantic Web will enable future work in artificial intelligence and machine learning to enable computers to infer new relationships and hypotheses in the life sciences.
糖组学数据的大量积累导致了许多有用数据库的开发,这些数据库有助于理解聚糖的功能及其对细胞活动的影响。同时,糖组学数据库与其他生物数据库之间的数据共享工作也促成了新知识库的创建。然而,数据描述中的不同数据类型阻碍了知识集成的数据共享。为了解决这个问题,包括资源描述框架(RDF)和本体开发在内的语义 Web 技术已被各个小组采用,以标准化数据交换格式。这些语义数据有助于知识库的扩展,并有望提供可智能处理的数据。另一方面,实验发现方面的专家—— bench 生物学家是最终用户和数据生产者。因此,减少 bench 生物学家操纵其实验数据以使其与数据共享的标准格式兼容所需的技术障碍是必不可少的。
数据集成有许多重要的概念和实用技术,但没有一种方法可以使研究人员轻松地将语义 Web 技术应用于其实验数据。我们对从网络上收集到的未格式化的大肠杆菌 O-抗原结构信息实施了我们的程序,并展示了如何将这些信息表示为适用于语义 Web 标准的格式化数据。特别是,我们使用为支持途径数据库之间的数据交换而开发的 BioPAX 本体描述了大肠杆菌 O-抗原生物合成途径。
我们实施的语义描述 O-抗原生物合成的方法应该有助于生物学家了解如何轻松共享聚糖信息,包括相关途径反应数据。我们希望这种方法可以帮助降低将实验结果制定成正式表示形式所需的技术障碍,并促使 bench 科学家能够轻松参与到与现有知识库的集成构建中。通过语义 Web 进行的这种集成将使人工智能和机器学习领域的未来工作能够使计算机推断生命科学中的新关系和假设。