Diao Jin, Zhou Zhangbing, Xue Xiao, Zhao Deng, Chen Shengpeng
School of Information Engineering, China University of Geosciences (Beijing), Beijing, China.
Computer Science Department, TELECOM SudParis, Evry, France.
Front Genet. 2022 Aug 26;13:941996. doi: 10.3389/fgene.2022.941996. eCollection 2022.
Constructing a novel bioinformatic workflow by reusing and repurposing fragments crossing workflows is regarded as an error-avoiding and effort-saving strategy. Traditional techniques have been proposed to discover scientific workflow fragments leveraging their profiles and historical usages of their activities (or services). However, social relations of workflows, including relations between services and their developers have not been explored extensively. In fact, current techniques describe invoking relations between services, mostly, and they can hardly reveal implicit relations between services. To address this challenge, we propose a social-aware scientific workflow knowledge graph ( ) to capture common types of entities and various types of relations by analyzing relevant information about bioinformatic workflows and their developers recorded in repositories. Using attributes of entities such as credit and creation time, the union impact of several positive and negative links in is identified, to evaluate the feasibility of workflow fragment construction. To facilitate the discovery of single services, a service invoking network is extracted form , and service communities are constructed accordingly. A bioinformatic workflow fragment discovery mechanism based on Yen's method is developed to discover appropriate fragments with respect to certain user's requirements. Extensive experiments are conducted, where bioinformatic workflows publicly accessible at the myExperiment repository are adopted. Evaluation results show that our technique performs better than the state-of-the-art techniques in terms of the precision, recall, and .
通过重用和重新利用跨工作流的片段来构建新的生物信息学工作流被视为一种避免错误和节省精力的策略。已经提出了传统技术来利用科学工作流片段的概要文件及其活动(或服务)的历史使用情况来发现它们。然而,工作流的社会关系,包括服务与其开发者之间的关系,尚未得到广泛探索。事实上,当前技术主要描述服务之间的调用关系,很难揭示服务之间的隐含关系。为了应对这一挑战,我们提出了一种社会感知科学工作流知识图谱,通过分析存储库中记录的有关生物信息学工作流及其开发者的相关信息,来捕获常见类型的实体和各种类型的关系。利用实体的属性,如信誉和创建时间,识别知识图谱中若干正向和负向链接的联合影响,以评估工作流片段构建的可行性。为了便于发现单个服务,从知识图谱中提取服务调用网络,并据此构建服务社区。开发了一种基于Yen方法的生物信息学工作流片段发现机制,以根据特定用户的需求发现合适的片段。我们进行了广泛的实验,采用了在myExperiment存储库中公开可用的生物信息学工作流。评估结果表明,我们的技术在精度、召回率和F1值方面比现有技术表现更好。