Lovis C, Baud R H
Puget Sound Health Care System, Seattle, Washington, USA.
J Am Med Inform Assoc. 2000 Jul-Aug;7(4):378-91. doi: 10.1136/jamia.2000.0070378.
The authors consider the problem of exact string pattern matching using algorithms that do not require any preprocessing. To choose the most appropriate algorithm, distinctive features of the medical language must be taken into account. The characteristics of medical language are emphasized in this regard, the best algorithm of those reviewed is proposed, and detailed evaluations of time complexity for processing medical texts are provided.
The authors first illustrate and discuss the techniques of various string pattern-matching algorithms. Next, the source code and the behavior of representative exact string pattern-matching algorithms are presented in a comprehensive manner to promote their implementation. Detailed explanations of the use of various techniques to improve performance are given.
Real-time measures of time complexity with English medical texts are presented. They lead to results distinct from those found in the computer science literature, which are typically computed with normally distributed texts.
The Boyer-Moore-Horspool algorithm achieves the best overall results when used with medical texts. This algorithm usually performs at least twice as fast as the other algorithms tested.
The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used. Considering the growing amount of text handled in the electronic patient record, it is worth implementing this efficient algorithm.
作者考虑使用无需任何预处理的算法来解决精确字符串模式匹配问题。为了选择最合适的算法,必须考虑医学语言的独特特征。在此方面强调了医学语言的特点,提出了所审查算法中最佳的算法,并提供了处理医学文本的时间复杂度的详细评估。
作者首先说明并讨论各种字符串模式匹配算法的技术。接下来,全面展示代表性精确字符串模式匹配算法的源代码和行为,以促进其实现。给出了使用各种技术提高性能的详细解释。
给出了对英文医学文本时间复杂度的实时测量结果。这些结果与计算机科学文献中的结果不同,后者通常是用正态分布文本计算得出的。
Boyer-Moore-Horspool算法与医学文本一起使用时能取得最佳的总体结果。该算法的执行速度通常至少是其他测试算法的两倍。
如果使用高效算法,精确字符串模式匹配的时间性能可以大大提高。考虑到电子病历中处理的文本量不断增加,值得实施这种高效算法。