arabic, anism1a2.gif (1972 bytes)arabic software solutions, anism1b1.gif (3409 bytes)arabic, anism1a2.gif (1972 bytes)

Arabic International & Multilingual Software  Desktop Publishing  Machine Translation  Document Management  NLP   OCR  ASR  TTS  MultimediA


| OCR Options | Optical_Character_Recognition | Automatic_Reader_OCR_FAQ | OCR_Gold_Edition |
| OCR_Platinum_Edition | OCR_Professional | OCR_Technologies | Universal_OCR_FAQ |

Universal OCR

The Universal Word supports Arabic character recognition (OCR). To start the Arabic OCR select Tools/Arabic OCR/OCR. The Arabic OCR “ICRA” starts. You can recognize images in two ways:
1       From an image file
2       From a scanned image
To recognize an image file, from ICRA select File/Open. Once the image file is opened, select Process/Recognize, the save menu will open. You need to type the file name, then click “Save”. The process of recognition starts. The recognized file shall appear in a universal word “Untitled” window, and at the bottom of  ICRA recognition window, and also shall be saved as a text file on your hard disk.
For other details on other functions of ICRA please look at ICRA Help.
Language Builder
The Arabic OCR allows you to build your own language and its associated rules. For more detailed information about this feature, please see ICRA Help.

Multi Language OCR:

The Universal Word supports character recognition (OCR) in twelve different languages. The languages supported are:
English (US), English (UK), French, Spanish, Portuguese, Italian, German, Dutch, Swedish, Danish, Norwegian, and Finish.
The program provides for the following features:
Automatic OCR:
This feature allows you to recognize a file automatically without allowing you to select any zones. As soon as you select the image file or the scanned image, the OCR starts recognizing the image without allowing the user to select zones. The results are displayed in a Universal Word window. Images are selected from the following sources:
-        Image File, which is an image previously scanned and saved to disk.
-       Scan Single page, which is an image that you scan singly on your scanner. Each time you need to scan an image, you need to go through the menu sequence.
-       Scan Many Pages, which are images that you scan but you don’t have to go through the menu sequence for every scan. Every time a scan is finished, you will be prompted to insert a new page into the scanner.
Manual OCR:
This feature allows you to recognize a file manually. You can select zones or partial text for recognition as you may desire. Once you finish selecting the zones, click on the “Close” button to start the OCR recognition. This mode allows you more control over the selection different portions of the image that you want to recognize. The results are displayed in a Universal Word document.
Options:
The Options menu provides for the following:
-       Auto De-Skew images
-       Auto invert Images
-       Auto Flip Images
-       Keep Text Attributes during OCR
-       Do Not Filter Dirty Lines during OCR
-       View Image Before OCR (Auto Mode Only)
-       View Image After OCR (All Modes)
OCR Interaction
-       No consultation
-       Consult when needed
Page Layout Mode
-       Automatic
-       Force one column
-       Columns
-       Manual
When the “Consult when needed” option is selected, the OCR will display a menu highlighting the unrecognized character. In an edit window you will see the letters “hmm” selected. If you type the replacement character every time you see this menu, the OCR will correct the recognition accordingly and will go on to recognize the rest of the image.
Page Layout Mode
The OCR allows you to format the page layout as follows:
-       Automatic
-       Force one column
-       Columns
-       Manual
Set OCR Languages
This feature allows you to select the languages for which you can use the OCR. It is better to select only the language which needs recognition to minimize the errors unless you have a multilingual document, then choose the corresponding languages.
If you check the box for “Linguistic Recognition” the OCR will also check the applicable dictionaries for more accurate recognition.
Scan Only
This feature allows you to scan images without activating the OCR.
View Only
This feature allows you to view the scanned images in the OCR program. The images are monochrome single bit images.

 

The OCR is the Optical Character Recognition system.  It introduces the idea of comparing the scanned images with the electronic text.  Scanned images are obtained through scanners, which means that it resembles the copied images.  The scanner translates the image into a grid or a map of millions of dots.  The dot, which is referred to also as bit in computer terminology, is assigned a value by the scanner interface.  This value (for binary scanner) is either a "0" representing a blank zone or a "1", for an inked zone. 
The number of dots forming the page map depends on the scanner RESOLUTION.  E.g. for a scanner resolution of 300 dots per inch (dpi), there is about 90,000 dot per square inch, and accordingly a full A4 paper will be formed of 8,415,000 dots.  The map of these dots (bits), and hence the names bit-map, is much like a photograph your painting.  The only way to make any changes is on the bit-level, where you can change colors.

On the other hand, text characters are assigned Identity codes, which are commonly known as ASC II codes.  Different code sets can be assigned to characters and are referred to as code-pages.  All word processors, spread sheet, databases, and other text processing system, basically manipulate these text codes.  Modification in text files is thus possible on the character level rather than on the bit level as in images.
OCR is thus the process of converting a bit-map of a scanned image containing text to text codes (ASC II).  At a first novice glance, the OCR process seems simple, when compared to human reading.  In computer domain, however, OCR is a sophisticated heavy application.

Research in the OCR field, started in the late 40's, and the first OCR system (Latin) reached the market in the 1954.  Early systems were limited specially designed fonts such as OCR-A, and OCR-B.  The OCR technology has progressed in popularity and sophistication, where by the 1970's, trainable systems appeared in the market.  These systems are designed to cover the growing variety of fonts used in typewriters and computers.
In the early OCR technology, the recognition process is merely is between a bit-map's of the character to a library of character shapes.  This method is called PATTERN MATCHING, and works for a specified number of fonts and sizes, although characters vary considerably in font and size.

Later OCR technology, is based on FEATURE EXTRACTION, which identifies a character, based on some local and topological characteristics.  By starting with the bit-map of a character, the process analyses the overall shape by extracting individual features; such as curves, corner, and line breaks.  The main advantage of using feature extraction, is the ability to recognize several fonts and sizes, without learning each font a priori.  This technology, however, requires a relatively strong processing power, and is implemented in most Latin OCR products.
Now, we introduce the Recent OCR technology, which called OMNI OCR engine. It can recognize any text without the need of teaching the system the font nor take any samples for shapes.  The new OMNI engine adopts new technologies one of which is the Common Fate Technology.  This technology is based on intensive geometrical analysis of shapes; the engine evaluates more than one possible solution - based on analysis of the text data given-, selecting the best possible one to get the best accuracy.

Having an OMNI OCR engine did not deprive the user from the Trainability features that were already available in previous versions of ICRA.      

The nature of Arabic printed text is radically than that for Latin other languages, which has a direct impact on the OCR process.  The main technical challenge faced is the CURSIVENESS of Arabic text, this is a fact since concatenation of isolated Arabic characters is unacceptable, and is not a normal way of Arabic writing.  Therefore, the major problem that has to be resolved is the SEGMENTATION of words into characters.  The concern in Latin OCR is rather different, where the RECOGNITION of characters for different fonts and sizes occupies the major focus.
Other difficulties which occur in many fonts in Arabic, is that characters OVERLAP on top of each other, either in one solid form or overhanging on their position within the word, which implies a larger pattern classes to be recognized relative to Latin languages.

It is important to point out that there are some Arabic characters, which are highly ambiguous in shape

Data Entry
The major bottle - neck in many large-scale business applications nowadays, is how to get data, embedded in documents inside the machine.  This requires data to be keyed in; and key entry was (and still is) an expensive, repetitive, time consuming process for large amounts of text.  Also recent results of ergonomic research has shown that excessive keyboarding caused some physical problems for the data entry person.  OCR can provide a reasonable practical solution, and can be used to generate full text data banks.
Desktop Publishing
Extracting text from input documents, which can be exported to another documents, after being OCRed.  The recognized text can be published within different environments.

Automatic Indexing
There is growing need for efficient automatic indexing of documents.  Indexing actually defines a link between the images stored, and the information those images contain.  Typically, the operator of an input station feeds documents into the scanner and then types in a logical index.  The OCR process when focussed on a certain zone can do this task, with minimal human intervention.
Basic OCR technologyTrainable Approach extended with OMNI properties.
Even though recognition in ICRA is based on font training the OMNI approach to OCR still - in a way - exists.  This is done in two ways: 
First by allowing the user to have any number of fonts active during recognition, choosing a set of fonts can provide wide coverage for a large number of fonts.  The user can build his own library of fonts with which to recognize any new font. 

Second by allowing recognition size dependent or size independent.  Size independence provides a wider coverage of fonts and sizes but at the expense of accuracy.
This compromise between OMNI and TRAINABLE technologies is an attempt to make use of the advantages of the two approaches.  The main advantage of the OMNI approach is that almost any text can be recognized without having to train the system first on the font used to write the text.  This, however, is at the expense of decreasing accuracy for non-standard fonts where some characters might have slight shape variances.  TRAINABLE technology solves the problem of these "unusual" fonts and a system that uses this approach can virtually recognize any font.  this, however, is at the expense of time since each new font will first require training.

Shape Training
Recognition in ICRA is completely based on font training.  Unlike the Latin character set where each character has a unique distinct shape, the Arabic character set contains characters that can have an identical shapes and are only differentiated by their hangers (dots, double dots, etc…).  "Fonts Training" in ICRA is done through learning shapes rather than characters which greatly reduces the training time for a font.  If there are three characters with the same shape then only one sample for the three characters is needed to train the system to recognize them.

Page Analysis
Page Analysis is one of the most useful features in any OCR product.  The main aim of this feature is to retain the structure of text within an image as much as possible in the output text as well.  This is very necessary in newspaper and magazines articles.  Given a page, ICRA processes the entire page and identifies zones of text and graphics.  Each zone of text is recognized as a separate entity producing a corresponding block of text while zones of graphics are eliminated from recognition.

Adaptive Trainability
This is a feature that enhances the TRAINABILITY approach used in ICRA, even though a wide range of common irregular shapes may appear in uncommon irregular fonts.  An irregular shape is defined to be a sequence of a number of basic character shapes.  Adaptive Trainability solves the problems of many unknown irregular shapes that may appear in some fonts.  It involves allowing the user to define any new irregular shape by combining a number of the basic character shapes.  The user can then take samples of this new shape and the system would identify the irregular character shape.  ICRA uses an expert system based on a knowledge base of rules in its identification of characters.  The rules of any new irregular shape are dynamically acquired.
Retain Page Formatting
ICRA attempts to reproduce the original page formatting and layout as closely as possible in the recognized document by preserving the following attributes:

Relative text column positioning
Margins
Tabs
Inter-line spacing
Indentation
Blank vertical spacing

In this chapter we will explain the use of the ICRA Gold command summery in integration with the Universal Word 2000.  The user has three menus, these menus are explained as follows.
OCR Menu
The ICRA Gold Main menu is a part of the Universal Word 2000 menus. It appears as follows from the Tools menu:
Where:
File/Settings:          Update the ICRA Gold settings.
Update Irregulars:      Define and update irregular shapes.
Learned Font:   Learn a new font to be used in recognition.
Panel:                  Show the Panel window.
Mode Menu
The ICRA Mode menu option is used to login into the ICRA mode in order to start the recognition, the learning and training processes.  This menu option could be attained through the Image Menu, as follows:

Recognize Menu
The Recognize menu option is used to start the recognition of the new font.  This menu option could be attained through the Image Menu.
SettingsICRA Gold recognizes images through two methods:
With enabling the OMNI base.
Without enabling the OMNI base.
Before recognizing any marked blocks of text on the currently loaded image, the user has make sure that the appropriate settings are chosen.  This could be done from the Settings menu option in the File menu.  In the following we will be more familiar with the different settings options and functions.
To define or change the settings, select the Settings item from ICRA sub-menu in the Tools menu, where the language Tab, OCR Tab and the Output Tab appear.
The language
This version of ICRA Gold recognizes the Arabic language.

To select a language, follow the next steps:
Select the required language from the Available Languages list box.  The properties of the selected language will be displayed to the right in the Properties section, these properties are:
Direction:              
        Left to right
        Right to left   
Cursive:
        Cursive
        Non-cursive
Diacritic:
        Language contains diacritics
        Language doesn't contain diacritics.
Click the OK button.
To exit without changing the previous selected language, click Cancel.
OCR
The OCR tab is place, where the user can select the parameters that control the recognition process of a certain image.
To define or modify the contents of the OCR tab
Select the OCR Tab from the ICRA Gold Setting dialog box.

Recognition Base
The recognition base determines the method in which the text to be recognized will be treated during recognition.  The user can either select or
Unselect the Enable OMNI Base option.
The OMNI Base
The OMNI Base is a base created from variety of samples, taken from the most general features that a regular font could have.  Those samples are taken to cover the majority of shapes that an image could have.  This option is valid only for the Arabic language. 
Click the Enable OMNI Base check box, when the user wishes to perform OMNI or partially OMNI recognition of Arabic text. 
The Font Family
This option is valid only for the Arabic language.  Font family specifies the category in which the font to be recognized falls.
Regular:  Select Regular if the font is not Koufi.
Koufi   :  Select Koufi if the font is Koufi.
Text Nature:  This option is valid only for the Arabic language.  This option specifies the source of a font by selecting the one or more of the followings:

Unknown:  This option could be selected if the source of the font is unknown.
Known:  This option could be selected if the source of the font is one of the following:
Computer Output:  The Computer Output is used if the font is produced from computer printing.
Books:   The Books is used, if the font comes from books.
Newspapers:  The Newspapers is used, if the font source is a newspaper.
Magazines:  The Magazines is used, if the font source is magazine.
Typewriter output:  The Typewriter Output is used, if the font comes from a typewriter.
Note: specifying the text nature helps in getting better accuracy in the recognition of the selected text.
Trainable Fonts
This list box displays all the fonts for the currently selected language.  If you select a font it will be added to the recognition base while unselecting it will remove it from the recognition base.

i.e. if the user selects one of the trainable fonts, the font will be used to complement the OMNI Base in the recognition process.

Noise Level
This facility was provided to filter out noise in documents from the output text.  If a certain document contains a minimal level of noise this value can be adjusted to filter out this noise.  Increasing this value will filter out noise with larger dimensions while decreasing it will filter out noise with smaller dimension. The user should handle this option with care, however since if this value is too large part of the actual text (e.g. dots and double dots) can be excluded from the text causing misrecognition.  If the user tries to take samples of dots in the training utility and they fail to appear in the object window then this value is most probably too big and should be decreased.
Allow Disconnection

This option is only valid for the Cursive languages. 
Select the Allow Disconnection option when you want to recognize cursive text that is disconnected at some places due to bad quality printing.
Filter Graphics
Filter Graphics means that the system will not include any graph or drawing in the recognition of the image.

The Output options determines the way the text from recognition will appear in the text file. 
Select the Output tab to define or modify the setting of the output text. It appears as follows:
In the above figure, the user can edit in the output of the recognized image.  The following options are available in the above figure:
Save Output As
This is the option where the user can select the files in which the result of the recognition will be saved.  This option can be:
One File
Multiple blocks are appended one after the other in the same file.
Block Separator enabled
Any piece of text that is used as a separator between blocks of text, in a single file, that correspond to successive blocks that have been marked or automatically detected.  This is only valid when multiple blocks are to be saved in One File.  Select block separator enabled if you wish a block separator to appear between blocks of text.

Before Text: This option is selected if the user wants the block separator to appear before every block of text.
After Text: This option is selected if the user wants the block separator to appear after every block of text.
Output Format
From this option the user can select the format that the result of the recognition will be saved in.
Text Format:  The output text is a flat ASC II format.
Vertical Indentation:  The vertical indentation is used to retain the vertical indentation of the output text.
Horizontal Indentation:  The horizontal indentation is used to retain the horizontal indentation of the output text.
Un-recognized Character:  From this option the user can specify the character to appear in the output text for any un-recognized character.

Creating Trainable Fonts
The main idea behind creating trainable fonts is to enhance the recognition accuracy for documents.
The system is designed to learn the basic character shapes of any language that has been defined using the "Language Builder" module.  A basic character shape can be a diacritic (e.g. a Dot or a Hamza in Arabic), a special character (e.g.? or ! in Latin) or some Arabic ligatures that has been pre-defined in the system or defined using the Shape designer.  For languages that have upper and lower case characters (e.g. English) both upper and lower cases form of every character has to be taken.
Character Forms
Any language defined using the language builder can either be cursive or non-cursive.
All character shapes in a non-cursive language can occur only in ISOLATED form.
For a cursive language each character shape can occur in one or more of the following positions

START
MIDDLE
FINAL
ISOLATED
Some of the shapes may be present in only one or two positions while others may occupy all four positions.
Ligatures
Due to the complexity prevailing in the printed text of some languages a special provision has been made in the system to handle ligatures. Ligatures are characters that override each other in a vertical manner, which poses a challenging problem when trying to decompose it into individual characters
Any encountered ligature can be defined using the Shape designer and hence samples of it can be taken.
Touching Shapes
Touching shapes are consecutive character shapes that connect together when the distance between them becomes very small. It was found that some touching shapes are very common in some fonts of some languages.

For the shape designer, the maximum number of shapes that can constitute a ligature or touching shape is 'Seven'.
Learning Font Window
The Learning Font Window is a window that contains all the trainable fonts available in the system in a list.  It gives the user the ability to create new trainable fonts and add them to the font's library. 
To select a font or to create a new font, select the Learning Font from the ICRA sub-menu, in the Tools menu.  As the user do that the next window appears:
The Current Font that is currently utilized in the recognition is displayed at the top of the window.  A list of trainable fonts is also displayed. 
The user can add sample to the list of fonts by doing the following:
Place the cursor on the Add Sample to Font edit box.
Write the name of the desired font.

Then click OK.
Main Body Window
The Main Body Window is the window where the object, that the user have selected to take samples from, is placed.  These shapes could be either Regular or Irregular shapes. 
This window will appear, when the user double clicks on the character in the Magnifying Glass window. 
In the following figure the Main Body window appears: 
In the above window the user can notice:
The constituent characters of the object will appear in Black.  These are the shapes that can be taken as samples. 
The Light Gray portions of the object are discarded by the system and cannot be taken as samples. 
The Red portion of the object represents the currently selected basic shape.
The two Dark Gray portions represents regions that can be filled in black, thus joining together the two black regions either side to form one basic valid shape.

To switch to the fill mode in order to fill these regions, click the Fill button.
Then click the required dark gray region. 
To set the fill mode off, press the Arrow button        .
The Cutting Options list
The cutting options list is a list that contains the cutting options available for the current selected object (shape). 
Select the option that will correctly segment the object into a number of valid character shapes.       
When you click on the button the next Selected Shapes Window shows.
To define a regular shape, the user can do the following:
Select the character(s) that should be defined by one of the following buttons:
The Cutting Options button.
The Filling button.
The Arrow button.
Click, the left mouse button, on the appropriate shape from the Panel Window.
Repeat the previous action in the rest of the shapes available in the Object window.
To define an Irregular shape the user can do the following:

Select the character(s) that should be defined by one of the following buttons:
The Cutting Options button.
The Filling button.
The Arrow button.
Define these characters from the Panel window using the right mouse button.
As you do that you will notice that the defined characters will appear respectively in the lower part of the Main Body Window. 
To add the defined shape to the Irregular shape's library, press the Add button.  A dialog box will appear asking if the user wants to define this sample.
By clicking the Yes button the Shape Designer Window, where the user can adjust the selected shape, will appear. 
To undo the last selected shape, press the undo button.
Tip: the user can cancel the whole selected shape from the lower part of the Main Body window by pressing the Delete button
Magnifying Glass Window
The Magnifying Glass window appears as soon as you double click on any part of the recognized text.  It magnifies the same part of the text that you double clicked.

As you double click on the any object in the magnifying glass, the Main Body Window comes into view with the selected object.
Panel Window
The Panel window is the window that contains the basic character shapes as well as the diacritics (if any) of the currently selected language that have been defined using the Language builder (other than Arabic).  Each button contains the basic shape of a character or a diacritic. 
The different Blue color degrees on the button of the character shape, reflect the position of the samples that have been taken for the character shape represented by that button.
A character shape's button that appears in normal light gray color indicates that no samples have been taken for that character shape.
A character shape's button that has in the lower half of it a blue color, indicates that at least one sample has been taken but that they still remains a position with no sample.
A character shape's button that appears in blue color indicates that all of the character shape's positions have been filled with samples.

Tip: Diacritics can only accept samples in ISOLATED form. Thus these character shapes require only an ISOLATED sample for their color to switch to blue.
To select one of the basic character shapes for sample taking just click with the mouse on its corresponding button in the panel.
Shape Designer Window
The Shape Designer window is used for designing and defining new irregular character shapes that are unknown to the system.  This increases the range of irregular character shapes that the system can identify.
To define and design a new irregular character shape, the user must choose the sequence of the basic character shapes from the Panel window. From the Main Body window the bitmap associated with it will appear in the
Note: Any basic character can be used to design a new shape. Up to 7 basic characters can be used to build a shape.

Shape Designer window
The Sample button is used to fill the Shape Designer window with the bitmap, which the user has selected from the Main Body Window.
The Clear button is used to empty the Shape Designer from any bitmap.
The Text button is used to change the bitmap of the currently selected irregular shape in the defined shapes first to a textual representation, which the user can specify.
The button used is to edit the bitmap representing any new shape.
The button is used to erase any part of the bitmap representing new shape.  After defining the sample, press Add Sample button to restore the designed sample in the Irregular Shapes.
Tip: To cancel all the operation, select the Cancel button.
Diacritics
If the basic shape contains a diacritic it will be placed in the shape designer according to some classification that depends on its position relative to the other basic shapes.  These classification are divided into 4 positions, which are:

Upper Child
In the first position, an upper child diacritic is an upper child to the second basic shape that is, in the sequence of basic shapes, constitutes the irregular shape.
For example:  The irregular shape consists of two basic shapes which are the nabra and the triple dots.
In this case the character and his upper child are defined from the Selected Shape Window.
When the user selects the diacritic character by the right mouse click, a dialog box, where the user can select the type the position of the diacritic, appears as follows:
In the middle position, If the diacritic is neither the first nor the last in the sequence of shapes it can be classified as an upper child to the basic shape that comes next in the sequence.
The first character shape above is an irregular shape that consists of a sequence of three basic shapes, which are the nabra, the dot and the hah.  The dot here is an upper child of the hah character.

The character and his upper child are defined from the Selected Shape Window.
Lower Child
In the middle position, if the diacritic is neither the first nor the last in the sequence of shapes it can be classified as a lower child to the basic shape that comes before it in the sequence.
The first character shape is an irregular shape that consists of a sequence of three basic shapes, which are the nabra, the double dots and the meem.  The double dots in this example is a lower child of the basic shape nabra.
In the last position, a lower child diacritic is a lower child to the second basic shape that is, in the sequence of basic shapes, constitutes the irregular shape.
For example: the irregular shape consists of three basic shapes.  These shapes are the feh, the yeh and then the double dots, where the double dots is the lower child of the yeh
.
This irregular shape is defined from the Panel window by the right mouse click, and it appears in the Selected Shapes.

Upper Neighbor
In the first position, The upper neighbor diacritics in the first position does not belong to the second basic shape in the sequence but it is considered an upper child to the character that comes before it in the context of some peace of text.
The first character shape above is an irregular shape that consists of a sequence of two basic shapes.  These shapes are the hamza and the lam, where the hamza is the upper neighbor to the lam character.  Since it does not belong to it, whereas it s an upper child to the character that comes before the lam in this particular piece of text.
In the last position, an Upper Neighbor diacritics in the last position does not belong to the previous basic shape in the sequence, but it is considered an upper child to the character that comes after it in the context of some piece of text.
In the above shape the last character shape is an irregular shape that consists of three basic shapes the lam, the alef and the double dots.  The double dots in this example represents an upper neighbor diacritic to the alef character.  Since it does not belong to the alef, otherwise it is considered an upper child to the character that comes after it in this piece of text.

Lower Neighbor
In the first position, a lower neighbor diacritic in the first position does not belong to the second basic shape in the sequence, but it is considered a lower child to the character that comes before it in the context of some piece of text.
In the above shape the second character shape is an irregular shape that consists of two basic shapes the double dots, and the ein.  The double dots in this example represents a lower neighbor diacritic to the ein character.  Since it does not belong to the ein, otherwise it is considered a lower child to the character that comes before it in this piece of text.
In the last position, a lower neighbor diacritic in the last position does not belong to the previous basic shape in the sequence, but it is considered a lower child to the character that comes after it in the context of some piece of text.
The above character shape is an irregular shape that consists of a sequence of two basic shapes, which are the reh, and the double dots.  The double dots in this example represents a lower neighbor diacritic to the reh character.  Since it does not belong to the reh, it is considered a lower child to the character that comes after it in this piece of text.

Irregular Shapes Window
The Irregular Shapes Window is the window that contains all the irregular shapes previously defined through the Shape Designer window.
The irregular shapes window consists of three main windows, that you can observe:
Defined Shapes: It's a list showing the pictorial representations of all previously defined irregular shapes.
Irregular Shape window: This window contains the pictorial representation of the irregular shape that is currently selected in the Defined Shapes list.
Basic Shapes window: It shows the constituent basic character shapes of the currently selected irregular shape in the Defined Shapes list.
The Text button is used to change the bitmap of the currently selected irregular shape in the defined shapes first to a textual representation.
Warning: While updating the bitmap, you cannot restore the bitmap to its state before the last changes you made.
The Clear button is used to clear the shape designer window.

The Update button is used to update the bitmap of the currently selected irregular shape with the current bitmap in the Irregular shape Window.
The Delete button is used to delete the currently selected irregular shape in the Defined Shapes list.
The OK button is used to close the dialog and save all the changes you made.
The Cancel button is used to close the dialog and cancel all the changes you made.
The Add Sample Dialog
The Add Sample Dialog appears only when the character shape is defined in the Manual Learning Mode, unlike the Automatic Learning Mode, where the sample is immediately saved and the selection is moved to the next character of the object.  This dialog appears when a letter is defined from the Panel window.
This dialog contains two windows.  The left window shows the character shape taken as New Sample, while the right window shows the previously taken samples of the same character shape.

The user can see the currently existing samples and decide whether or not to save the new samples.
Press the Add Sample button to add the character shape.
Press the Close button to cancel the operation.
On clicking the Add Sample button, the sample will transfer to the right window indicating that it has been saved.
Press the Delete button to delete the current shape or the added shape.
Three small windows are installed at the right side of the window.
The first shows the Total Number of available previously taken samples of this character shape.
The other window presents the Position of the current sample in proportion to the previous or the next character in the specified text.  The position might be a Start, Middle, Final or Isolated (like this example).
And the third window is a miniature that shows the Shape of the current sample character that appears in the right window.  

Learning Process
To start learning a new font the user can do the following:
Open the archive and the folder where the required image exists.
Activate the image.
Select the ICRA Mode item form the Tools menu to enter in the ICRA Mode.
Define the required font from the Learning Fonts panel.
Activates the Panel Window.
Select the shapes that will be trained to the system by double clicking on it from the Image window.
The Magnifying Glass window appears, double click on the selected shape from the Magnifying Glass window.
The Main Body window appears with the shape in it. From the panel window click on the character that represents the selected shape.
Repeat these steps for all the shapes that will be defined to the system.
If the shape is irregular use the irregular shapes (mentioned above) process in defining it.
Language BuilderICRA Gold is not only an Arabic OMNI base, but also it can be trained to recognize other languages. That's why the Language Builder exists.  The Language Builder is the utility where users can define to the system the language, basis and rules that control the efficiency of the recognition process.

Moreover, the language builder is considered an important tool that helps the users in enlarging the scale of recognized images, to get to the most possible accuracy in the field of character recognition.
The Language Builder Command Summery
The Language Builder command summery, where the user can perform different manipulations, includes:
Language Menu:  The Language menu have the following:
New:    Create a new language.
Load:   display an existing language.
Delete: Delete an existing language
Undelete:       Undelete the deleted languages
Save:   Save the current active
Save As:        Save the current active language with a new name
Exit:   Quit the application; Prompt to save changes

Toggle Language Menu: The Toggle Language menu helps in changing the interface language from Latin to Arabic and vice versa.  This appears in the following example:
Latin Language                  Arabic Language
Define a New Language: The user can define to the system a new language, and the defined language could be used later in the recognition of any image in ICRA.
To define a language, follow the following steps:
Select the New item from the Language menu.
Direction: The direction of the Language, that the user will read, whether it will be from Left to Right or from Right to Left.
Cursive: A language is known to be cursive if its words are formed of connected characters. 
For example, Arabic is a cursive language, but English may not be considered cursive if we are talking about the printed English.

Diacritic A language is known to be diacritic if some of its alphabets are distinguished from some other alphabets (or in other words hangers) above or under them.
A language is a set of Basic Shapes, Basic Shapes Rules and Equivalence Rules.  These three components together with the samples enable ICRA to correctly recognize a text written with this given language.  The following Basic Shape window appears so that the user can start defining the basic shapes of the defined language:

The basic shape window is the window where the user can define, add, edit and/or delete the basic shape (characters/diacritics) that constitutes the language.  The basic shape window appears in the following figure:
As we can notice the above window appears empty, as the user did not define any shape yet.
To start defining basic shape: Press the New button, the following window will appear:
Shape Name: The user can enter the name of the shape that will be defined in the Edit Box.  This name could be any meaningful name.  E.g. Aleph, Beh, The, etc…
Note:  The shape Name should not exceed 15 character.
Root: From this option the user can specify that this shape is a character shape that has a root.  The root shape has valid position and invalid positions. 
For example, the Aleph may come at the end of a main body or may come as a stand alone character but it cannot come in the middle of a word.  The followings are the positions of the character shape:

Start: Select this button if the character could have a 'Start' position.
Middle: Select this button if the character could have a 'Middle' position.
Final:  Select this button if the character could have a 'Final' position.
Isolated: Select this button if the character could have a 'Isolated' position.
Diacritics: From this option the user can specify that this shape is a Diacritics shape, e.g. Dot, Double Dot, Hamza etc….  The followings are the positions of the diacritic shape:
Up: Select this button if the character could have a 'Upper' position.
Down: Select this button if the character could have a 'Lower' position.
The Diacritics shape may come in different modes, these modes are:
Simple: This option is selected if the diacritic is in a simple form - not composed of some other hangers, e.g. Hamza.
The simple diacritic can also come in an Isolated Form such as the Hamza, which is in some cases a hanger and in other cases it is an isolated character.  If the user is defining a hanger that could come in both cases, then he can select the "Isolated Form" check box, and vice versa.

Compound: This option is selected if the diacritic is in a compound form - composed from other hangers, e.g. Double Dots which is composed from two Dots.
To define this option, drag the diacritic components from the 'Hangers' list box then drop it in the 'Components' list box.
After defining the properties of the selected shape, the user can enter the Shape form, in the blank box. 
This shape will be represented in a bitmap form as soon as the user presses the Paste button.  The user can edit in the pasted shape, by clicking on the mouse in the Bitmap area.
To delete a shape from the bitmaps area, press the Clear Selection button, the bitmap will be completely cleared.
Press OK, when finishing.
The defined shape will appear in the Basic Shapes window, as follows:
To view the details of this shape, press the Details button, the Basic Shape Details will re-appear. 

The Basic Shapes Rules are those that defines alphabet characters in terms of a basic shape and a hanger.
In this window, a Defined Rules list box showing all the available rules and a group of check boxes that show the position, where the defined rule will be applicable, are displayed.
To define a new rule, press the New button.
To define a new rule, select a basic shape from the 'Shapes' list. 
Drag it.  The following cursor appears while dragging       .
Drop it in the leftmost edit box.
Then, select the appropriate hanger from the Hangers list.
Drag it. The following cursor appears while dragging       .
Drop it in the middle edit box.
Note: Beside the hangers there are arrows shown which indicates the position (up or down) of the hanger. 
Type the character that the rule represents in the rightmost edit box.
Then select the position that this character can have from the list of Positions available.

Press OK, when finishing.
The defined shape will appear in the Basic Shapes window, as follows:
To view the details of this shape, press the Details button, the Basic Shape Details will re-appear. 

The Equivalence Rules are those rules that give ICRA a definition of the meaning of some sequences of basic shapes. 
For example, an equivalence rule, in Arabic Language, may states the following; that if a TICK (start or middle) comes followed by another middle TICK followed by a NOON then these three shapes represent a SEEN.  This rule is very important as in the segmentation of the main body the segment of ICRA cuts the SEEN into two TICKS and a NOON.
The equivalence rule may consist of two or three sequences. A sequence composes of a basic shape in one or more positions, and a hanger in a certain position.
In the example stated above, we have three sequences:
TICK in Start or middle position.
TICK in Middle position.
NOON in the Final position.
To define a new Equivalence Rule, click on the New button in the Equivalence Rules window. The Define Equivalence Rule Dialog will show up as shown below:

To compose this rule, choose the shape (Tick) from the Shapes list box.
Select one of positions (Start or Final) from the Positions list box.
From any of the list boxes while holding the CTRL key drag the selection and drop it in the Members list box.
Repeat the above sequence for the next numbers sequences.
Press OK.
The rule now is defined and it appears in the following window:
RecognitionTo start recognizing any scanned images that exists in the archiving system as an imported document, the user can follow the following steps:
Open the Archive, where the image exists.
Open the parent folder of the image.
Select the image from the Document window of the folder.
From the ICRA sub-menu in the Tools menu, select the Settings item to adjust the setting of the current recognition, if desired (For more details about the Setting menu of ICRA Gold.

Select the ICRA Mode item either from the Image menu, or from the mouse right list.
Select - part of or all - the page to be recognized by blocking it.
To block an area, place the cursor on the beginning of the text area, hold the button down while moving the cursor to the end of the desired area, then drop the button.
Tip: To delete the selected block, select it then press the Delete button, or select the Delete item either from the Edit menu or from the Popup menu.  The selected block will be deleted.
Select the Recognize item either from the Image menu, or from the Popup menu.
ICRA will start recognizing the selected image and the Image caption will show the percentage of successively completed recognition.


OCR Results
The results of the recognition of an image will appear to the user as an annotation attached to the document.

To see this result or this annotation, select the Annotations items either from the View menu or from the Popup menu.
Select the annotation ICRAresult. Then press the Edit button to open this annotation.
If the user wants to teach the system some of the shapes that were not recognized correctly, he has to follow the below steps:
-       Select or create a font from the Learning Fonts dialog box.
-       Select the required shape from the Image window by double clicking on it.
The Magnifying Glass window will appear with the selected shape. Double click on the selected shape in the magnifying glass.
The Main Body window appears.
Start the training process.
                

| OCR Options | Optical_Character_Recognition | Automatic_Reader_OCR_FAQ | OCR_Gold_Edition |
| OCR_Platinum_Edition | OCR_Professional | OCR_Technologies | Universal_OCR_FAQ |



AramediA

Join Our Newsletter

61 Adams Street, Braintree, MA 02184-1906
United States of America (USA)
Tel 1-781-849-0021 Fax 1-781-849-2922

 

animail2.gif (5769 bytes)

 

We Ship All Around the Globe

Copyright © 1995 - 2009 - GnhBos, Incorporated. dba AramediA. All rights reserved.

 



 

 

 

 

 

 

 

 

 

 

 

 

 

 




Dictionary software, multimedia, Bidirectional, English Arabic English, English Dictionaries, Arabic Dictionary,
covers many languages dictionary and applications, please call us for more information at
1-781-849-0021:
Word processing for the following languages is also available: European, Arabic, Hebrew, Cyrillic, Asian and Indian Languages, Albanian, Arabic (includes spell checker), Aramaic, Armenian, Azeri-Arabic, Azeri-Cyrillic, Azeri-Turkish, Bengali, Bohemian(Czech), Bulgarian, Burmese, Byelorussian, Croatian, Danish, Dutch, English, Esperanto, Farsi, French, Finnish, Georgian, German, Greek/Modern, Greek/Classical, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, International Phonetic Alphabet, Inuktitut, Italian, Kannada, Khmer, Ladino, Lao, Latin, Latvian, Lihyanite, Lithuanian, Macedonian, Malayalam, Malay-Jawi, Marathi, Moabite, Mongolian, Nabataean, Nepali, Norwegian, Oriya, Oromo, Pashto, Polish, Portuguese, Punjabi, Rumanian, Russian, Safitic,Sami,
Sanskrit, Serbian, Sinhalese, Slovak, Slovenian, South Arabian, Spanish, Swedish, Swiss, Syriac-Eastern, Syriac-Estrangelo, Tagalog, Tamil, Telugu, Thai, Talmudic, Tibetan, Tigrinya, Tigre, Transliteration, Turkish, UK-English, Ugaritic, Ukrainian, Urdu, Vietnamese, Welsh, Wendish
Lusatian, sorbian , Yiddish,


 



english-arabic,arabic terminology,synonym,arabic terminologies,word-hoard,
english-arabic translator, english-arabic translator,translation services,acronyms,phrase, spell,Ajeeb,arabic translators,translation,word,french-arabic,phraseology,synonym, islam,spell,translation services,thesaurus, language,arabic terminology,dictionary, meaning,arabic dictionary,lexicon,arabic translator,domain, language,arabic translator,synonym,arabic dictionary,Arabic lexicography,vocabulary,lexicon,
Boston Limo

Dictionary software, multimedia, Bidirectional, English Arabic English, English Dictionaries, Arabic Dictionary, covers many languages dictionary and applications, please call us for more information at 1-781-849-0021:

Learn, read, and write these languages: