Al-Dabbagh Mohammed Mumtaz, Salim Naomie, Rehman Amjad, Alkawaz Mohammed Hazim, Saba Tanzila, Al-Rodhaan Mznah, Al-Dhelaan Abdullah
Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia ; Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq.
Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia.
ScientificWorldJournal. 2014;2014:612787. doi: 10.1155/2014/612787. Epub 2014 Sep 17.
This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.
本文提出了一种新颖的从无法通过光学字符识别(OCR)进行挖掘的文档中提取特征的方法。通过识别文本与图形组件之间的紧密关系,该技术提取出每个条形图的起始值、结束值和精确值。此外,还使用了词二元语法和欧几里得距离方法来准确检测和确定条形图中的抄袭情况。