Zheng Yefeng, Li Huiping, Doermann David
Language and Media Processing Laboratory, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742-3275, USA.
IEEE Trans Pattern Anal Mach Intell. 2005 May;27(5):777-92. doi: 10.1109/TPAMI.2005.89.
The detection of groups of parallel lines is important in applications such as form processing and text (handwriting) extraction from rule lined paper. These tasks can be very challenging in degraded documents where the lines are severely broken. In this paper, we propose a novel model-based method which incorporates high-level context to detect these lines. After preprocessing (such as skew correction and text filtering), we use trained Hidden Markov Models (HMM) to locate the optimal positions of all lines simultaneously on the horizontal or vertical projection profiles, based on the Viterbi decoding. The algorithm is trainable so it can be easily adapted to different application scenarios. The experiments conducted on known form processing and rule line detection show our method is robust, and achieves better results than other widely used line detection methods.
在诸如表格处理以及从划有横线的纸张中提取文本(手写内容)等应用中,检测平行线组非常重要。在文档质量较差、线条严重断裂的情况下,这些任务可能极具挑战性。在本文中,我们提出了一种基于模型的新颖方法,该方法纳入了高级上下文信息来检测这些线条。经过预处理(如倾斜校正和文本过滤)后,我们使用经过训练的隐马尔可夫模型(HMM),基于维特比解码在水平或垂直投影轮廓上同时定位所有线条的最佳位置。该算法具有可训练性,因此能够轻松适应不同的应用场景。在已知的表格处理和横线检测方面所进行的实验表明,我们的方法具有鲁棒性,并且比其他广泛使用的线条检测方法取得了更好的结果。