OCRPro.ServerSide.Request

The server-side OCR runs as a service. To OCR documents, you need to have them saved to the disk first. Then send a request to the service for the OCR. The request has the following structure:

//OCR Request Body
{
    productKey: "",  
    inputFile:
    [
      "d:\\input\\1.tif", 
      "d:\\input\\2.pdf", 
      "d:\\input\\3.png"  
    ], 
    settings:
    {
      recognitionModule: "auto", /*optional*/
      languages: "eng,arabic",
      recognitionMethod: "File",
      threadCount: "2", /*optional*/
      outputFormat: "IOTPDF",
      pdfVersion: "1.7", /*optional*/
      pdfAVersion: "pdf/a-2a", /*optional*/
      redaction:
        {
          "findText": "AAA",
          "findTextFlags": 1,
          "findTextAction": 0
        }
    },
  zones: /*optional*/
    [
      [100, 100, 200, 300], [100, 600, 100, 200]
    ],
  outputFile: " d:\\temp\\ocrresult.pdf"
}

SpaceHoler

productKey

The product key which is generated from an OCR license.

SpaceHoler

inputFile

This is an array of strings which are absolute paths of the files uploaded and saved on the server. The supported formats are BMP, JPG, TIF, PDF, PNG, JBIG2, JPEG2000, PCX, etc. Please note the use of '\\' instead of just '\'.

SpaceHoler

settings

Settings for the OCR.

SpaceHoler

recognitionModule

Specifies which Module is to be used for this OCR. Allowed values are

Module Description
mostaccurate Most Accurate but it also costs most time.
fastest Fastest module but not very accurate.
balanced This module maintains a balance betwwen accuracy and performance.
auto The most suitable module will be automatically used (MOSTACCURATE, BALANCED or FASTEST). This is the default value.

SpaceHoler

languages

Specifies the language for this OCR. For example, English: "eng", Arabic :"arabic". You can also set multiple languages like this "eng,arabic".

SpaceHoler

recognitionMethod

Specifies how the OCR is performed.

Method Description
Page OCR a page at a time. This is the default value.
File OCR a file at a time. It's faster and it supports multiple threads. However you'll not be able to get detailed information like the coordinates for a certain letter. Also, zonal OCR is not supported in this mode.

SpaceHoler

threadCount

Specifies the maximum number of threads to be used for this OCR. The default value is -1 which means all possible threads will be used. This setting is only valid when recognitionMethod is set to 'File'.

SpaceHoler

outputFormat

Specify the file type for outputting the OCR result.

Format Description
TXTS Standard text file.
TXTCSV CSV text file.
TXTF Formatted Text file.
XML Simple XML file.
IOTPDF Image over text PDF file.
IOTPDF_MRC Image over text PDF with MRC technology.

SpaceHoler

pdfVersion

Specifies the version of the PDF file if the outputFormat is set to either IOTPDF or IOTPDF_MRC. The version number allowed are 1.0 to 1.7 and by default it is 1.5.

SpaceHoler

pdfAVersion

Specifies the version of the PDF file if the outputFormat is set to either IOTPDF or IOTPDF_MRC. The version number allowed are "pdf/a-1a","pdf/a-1b","pdf/a-2a "," pdf/a-2b "," pdf/a-2u ", " pdf/a-3a ","pdf/a-3b","pdf/a-3u".

SpaceHoler

redaction

Specify how the redaction is done.

Option Description
findText A string to specify what to find.
findTextFlags Specifies how the text is found. The allowed values are 1 (WHOLEWORD), 2 (MATCHCASE), 4 (FUZZYMATCH). findTextAction Specifies the action once the text is found. The allowed values are 0 (HIGHLIGHT), 1 (STRIKEOUT) or 2 (MARKFORREDACT).

SpaceHoler

zones

Specifies which zones are to be OCR'd. There can be multiple Zones per page. The coordinates are in the sequence of [[left, top, right, bottom]].

SpaceHoler

outputFile

Specifies where the output file will be placed. If specified, all OCR'd files will be merged as one file and saved at the specified location. Otherwise, the result will be returned in the HTTP Response.