![]() | Only 14 pages are availabe for public view |
Abstract Recently, the spell correcting of optical character recognition (OCR) has been one of the main focuses of natural language processing research. The challenges of the Arabic language and the lack of resources have made it difficult to provide Arabic OCR systems with high accuracy. Post-processing techniques are used to correct the Arabic degraded OCR text. This research presents a new correction model for Arabic OCR errors. The proposed model is mainly based on the character segmentation and the character alignment on a single character or multi-characters. This research investigates four factors can affect the proposed model: (i) the effect of increasing the size of training set, (ii) the effect of adding the training and test sets words into the dictionary to find the correct words of the candidate words, (iii) the effect of using different versions of OCR application upon testing, and (iv) the effect of using different fonts upon testing. The results show that the first and the second factors have a positive effect, but the third and the fourth factors have a negative effect on the performance of the model. Results also show that the proposed model contribute in enhancing the performance of the model |