Academic Company Events NI Developer Zone Support Solutions Products & Services Contact NI MyNI

Document Type: Tutorial
NI Supported: Yes
Publish Date: Mar 29, 2007

Optical Character Recognition (OCR) with Italic Characters

1 ratings | 1.00 out of 5
Print

Overview

Italic characters often overlap vertically and consequently cannot directly be recognized using NI-OCR tools. This trouble can be overcome by a simple image calibration, and then OCR can be performed directly on the resulting corrected image. This document describes how this calibration should be performed to read italicized characters.

Trouble with Italic Characters

Optical Character Recognition (OCR) is a programmatic method for recognizing characters in an image.   The act of training and recognizing characters is straightforward, although it may take some time to get the appropriate settings.  With italic characters, however, the recognition cannot be performed directly as italic characters often overlap in the vertical direction and thus cannot be split into separate characters.  Figure 1 illustrates this difficulty.  In Figure 1.a, the character bounding boxes are clearly defined as we can draw a vertical line separating each character.  Figure 1.b shows how the characters may be grouped when they overlap in the vertical direction.  One method to work around the character overlap is to rotate each character’s bounding box and read each character separately.  This individual bounding box rotation is shown in Figure 1.c. 

 

Figure 1. Italic characters overlap in the vertical direction.

Figure 1. Italic characters overlap in the vertical direction.

 

While the method shown in Figure 1.c can be used, it is time consuming for the user to carefully place each bounding box to recognize each character.  Another method for performing OCR when characters overlap vertically is to calibrate the image.  Calibration can be used to unitalicize the characters as shown in Figure 2.  Once the image is calibrated as shown below a single Region of Interest (ROI) encompassing the entire word can be drawn and OCR performed as it normally would be.

Figure 2. Calibration can unitalicize a word.

Figure 2. Calibration can unitalicize a word.

Calibration of Italicized Characters

Since the calibration is key to this adjustment, we will describe this in more detail here.  A simple perspective calibration can be performed to achieve the desired results.  With NI Vision this calibration can be performed by carefully selecting user defined points, or with a grid that is slanted at the same angle as the italics.  For demonstration purposes, this article will describe how to perform this calibration with user defined points.  First, select points at the far corners of your word.  These points should align with each other horizontally and be slanted at the same angle of the characters as shown in Figure 3.  After selecting the 4 points, specify the appropriate coordinates that will result in unslanted characters.  Example coordinates are shown in Figure 3.

 

Figure 3. Select user-defined points and specify appropriate coordinates.

Figure 3. Select user-defined points and specify appropriate coordinates.

 

After you have calibrated your selected points or your grid, you can save this calibrated image and use it to calibrate future images (provided that the future image is the same size as your calibrated image).  A calibrated image can be corrected and then OCR can be performed as it would usually by selecting an ROI around your word and reading the word based on your trained character file.  For more information about calibration see the examples installed with NI Vision or also on-line here.  You can also reference Spatial Calibration or the NI Vision Concepts Manual for more details about calibration. 

Example Code

A small example program demonstrating this method is attached below.  The example program performs the following steps:

1.      Opens image with italicized words

2.      Calibrates image from a calibrated image

3.      Corrects image with italicized words

4.      Loads character set and parameters

5.      Performs OCR on calibrated image

6.      Displays read string and overlays bounding boxes of characters and Region of Interest

LabVIEW 7.1 or higher and Vision 8.0 or higher are required to run this code.  Download and extract the files to a folder on your computer then run lv71_italic_ocr.vi.

Additional Links

NI Developer Zone: Spatial Calibration

NI Vision Concepts Manual

Downloads

italicocr.zip

1 ratings | 1.00 out of 5
Print

Reader Comments | Submit a comment »

 

Legal
This tutorial (this "tutorial") was developed by National Instruments ("NI"). Although technical support of this tutorial may be made available by National Instruments, the content in this tutorial may not be completely tested and verified, and NI does not guarantee its quality in any way or that NI will continue to support this content with each new revision of related products and drivers. THIS TUTORIAL IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND AND SUBJECT TO CERTAIN RESTRICTIONS AS MORE SPECIFICALLY SET FORTH IN NI.COM'S TERMS OF USE (http://ni.com/legal/termsofuse/unitedstates/us/).