IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2074-2085. doi: 10.1109/TCBB.2019.2913368. Epub 2020 Dec 8.
The data representation as well as naming conventions used in commercial screen files by different companies make the automated analysis of crystallization experiments difficult and time-consuming. In order to reduce the human effort required to deal with this problem, we present an approach for computationally matching elements of two schemas using linguistic schema matching methods and then transform the input screen format to another format with naming defined by the user. This approach is tested on a number of commercial screens from different companies and the results of the experiments showed an overall accuracy of 97 percent on schema matching which is significantly better than the other two matchers we tested. Our tool enables mapping a screen file in one format to another format preferred by the expert using their preferred chemical names.
不同公司的商业屏幕文件中使用的数据表示和命名约定使得结晶实验的自动化分析变得困难且耗时。为了减少处理此问题所需的人工,我们提出了一种使用语言模式匹配方法计算匹配两个模式元素的方法,然后将输入屏幕格式转换为用户定义命名的另一种格式。该方法在来自不同公司的多个商业屏幕上进行了测试,实验结果表明模式匹配的总体准确性为 97%,明显优于我们测试的其他两个匹配器。我们的工具可以使用专家喜欢的化学名称将一个格式的屏幕文件映射到另一个格式。