Tutorial: Working with the PDF Recognition SDK

Learning Objectives

By the end of this tutorial, you will be able to:

Set up and configure the Aspose.OCR Cloud SDK in your project
Authenticate your application with the SDK
Submit PDF documents for recognition using a simplified workflow
Process recognition results programmatically
Implement error handling and reliability patterns

Prerequisites

Before starting this tutorial, make sure you have:

An Aspose Cloud account with an active subscription or free trial
Your Client ID and Client Secret from the Aspose Cloud Dashboard
Development environment set up for your chosen programming language:
- .NET: Visual Studio or Visual Studio Code with .NET Core 3.1+
- Java: JDK 8+ and Maven/Gradle
- Android: Android Studio
- Node.js: Node.js 12+ and npm/yarn
Basic knowledge of your chosen programming language
A sample scanned PDF document for testing

Understanding the SDK Advantage

While you can directly interact with the Aspose.OCR Cloud REST API, using the SDK provides several benefits:

Simplified authentication
Type safety and code completion
Built-in error handling
Streamlined workflow with fewer lines of code
Automatic handling of HTTP requests and responses

Step 1: Install the SDK

First, you need to install the SDK for your programming language:

.NET

dotnet add package Aspose.OCR-Cloud

Java

<dependency>
  <groupId>com.aspose</groupId>
  <artifactId>aspose-ocr-cloud</artifactId>
  <version>latest.version</version>
</dependency>

Android

Add to your app’s build.gradle:

dependencies {
  implementation 'com.aspose:aspose-ocr-cloud-android:latest.version'
}

Java

// Fetch recognition result
OCRResponse apiResponse = api.getRecognizePdf(taskId);

// Process and display the result
if (apiResponse.getTaskStatus() == TaskStatus.COMPLETED) {
    // Access the recognized text
    String recognizedText = new String(apiResponse.getResults().get(0).getData(), StandardCharsets.UTF_8);
    System.out.println(recognizedText);
    
    // Alternatively, save to a file
    Files.write(Paths.get("recognized_text.txt"), recognizedText.getBytes());
}

Android

// Fetch recognition result
OCRResponse apiResponse = api.getRecognizePdf(taskId);

// Process and display the result
if (apiResponse.getTaskStatus() == TaskStatus.COMPLETED) {
    // Access the recognized text
    String recognizedText = new String(apiResponse.getResults().get(0).getData(), StandardCharsets.UTF_8);
    System.out.println(recognizedText);
    
    // Alternatively, save to a file or display in your app
    TextView resultTextView = findViewById(R.id.resultTextView);
    resultTextView.setText(recognizedText);
}

Node.js

// Fetching recognition result
function callGetPdfFunction(id){
    return new Promise(function(resolve, reject){
        api.getRecognizePdf(id, (err, res, body) => {
            if (err) {
                reject(err);
            }
            resolve(body);
        })
    })
}

// Processing and displaying the result
function processResult(body){
    console.log('Processing results...')
    const json_res = JSON.parse(body['text']);
    
    // Check task status
    if (json_res['taskStatus'] === 'Completed') {
        // Access the recognized text
        const recognizedText = atob(json_res['results'][0]['data']);
        console.log(recognizedText);
        
        // Alternatively, save to a file
        fs.writeFileSync('recognized_text.txt', recognizedText);
    }
}

// Recognition flow
connect().then(
    access_token => callPostPdfRecognizeFunction(access_token, "source.pdf")
).then(
    x => new Promise(resolve => setTimeout(() => resolve(x), 1000))
).then(
    id => callGetPdfFunction(id)
).then(
    body => processResult(body)
)

Try it yourself:

Implement the complete recognition flow in your chosen programming language, from authentication to displaying the results.

Node.js

npm install aspose_ocr_cloud_5_0_api --save

Try it yourself:

Install the SDK for your preferred programming language and verify that it’s correctly integrated with your project.

Step 2: Configure Authentication

Every SDK interaction begins with authentication using your Client ID and Client Secret:

.NET

using Aspose.OCR.Cloud.SDK.Api;
using Aspose.OCR.Cloud.SDK.Model;

// Initialize the API with your credentials
RecognizePdfApi recognizePdfApi = new RecognizePdfApi("<Client Id>", "<Client Secret>");

Java

import Aspose.OCR.Cloud.SDK.RecognizePdfApi;
import Aspose.OCR.Cloud.SDK.model.*;

// Initialize the API with your credentials
RecognizePdfApi api = new RecognizePdfApi("<Client Id>", "<Client Secret>");

Android

// Initialize the API with your credentials
RecognizePdfApi api = new RecognizePdfApi("<Client Id>", "<Client Secret>");

Node.js

const AsposeOcrCloud10040Api = require('aspose_ocr_cloud_5_0_api');
const request = require('aspose_ocr_cloud_5_0_api/node_modules/request');

// Getting access token
function connect(){
    return new Promise(function(resolve, reject){
        request.post({
            headers: {"ContentType": "application/x-www-form-urlencoded", "Accept": "application/json;charset=UTF-8"},
            url: "https://api.aspose.cloud/connect/token",
            form: JSON.parse('{"client_id": "<Client Id>", "client_secret": "<Client Secret>", "grant_type": "client_credentials"}')
        }, (err, res, body) => {
            if (err) {
                reject(err);
            }
            resolve(body);
        });
    });
}

Learning Checkpoint:

What are the benefits of using the SDK over directly calling the REST API? List at least three.

Answer

Benefits include simplified authentication, type safety and code completion, built-in error handling, streamlined workflow with fewer lines of code, and automatic handling of HTTP requests and responses.

Step 3: Prepare the PDF Document

Next, you need to load your PDF document into memory:

.NET

// Read source PDF document to array of bytes
byte[] pdfData = File.ReadAllBytes("source.pdf");

Java

// Read source PDF document to array of bytes
byte[] pdfData = Files.readAllBytes(Path.of("source.pdf"));

Android

// Read source PDF document to array of bytes
String pdfFileName = "source.pdf";
InputStream inputStream = context.getAssets().open(pdfFileName);
int size = inputStream.available();
byte[] pdfData = new byte[size];
inputStream.read(pdfData);
inputStream.close();

Node.js

const path = require("path");
const fs = require("fs");

// Read source PDF document
var filePath = path.normalize("source.pdf");
var buffer = Buffer.alloc(1024 * 50);
var fileData = fs.readFileSync(filePath, buffer);

Step 4: Configure Recognition Settings

Set up the recognition parameters according to your document’s characteristics:

.NET

// Specify recognition settings
OCRSettingsRecognizePdf recognitionSettings = new OCRSettingsRecognizePdf {
    Language = Language.English,
    ResultType = ResultType.Text,
    MakeSpellCheck = true,
    MakeContrastCorrection = true
};

Java

// Specify recognition settings
OCRSettingsRecognizePdf settings = new OCRSettingsRecognizePdf();
settings.setLanguage(Language.ENGLISH);
settings.setResultType(ResultType.TEXT);
settings.setMakeSpellCheck(true);
settings.setMakeContrastCorrection(true);

Android

// Specify recognition settings
OCRSettingsRecognizePdf settings = new OCRSettingsRecognizePdf();
settings.setLanguage(Language.ENGLISH);
settings.setResultType(ResultType.TEXT);
settings.setMakeSpellCheck(true);
settings.setMakeContrastCorrection(true);

Node.js

// Specify recognition settings
let settings = new AsposeOcrCloud10040Api.OCRSettingsRecognizePdf();
settings.Language = "English";
settings.ResultType = "Text";
settings.MakeSpellCheck = true;
settings.MakeContrastCorrection = true;

Try it yourself:

Experiment with different recognition settings based on your document’s characteristics. For example, if your document contains small text, set MakeUpsampling to true.

Step 5: Submit the PDF for Recognition

Now, create a request and submit your PDF for recognition:

.NET

// Send PDF for recognition
OCRRecognizePdfBody source = new OCRRecognizePdfBody(pdfData, recognitionSettings);
string taskID = recognizePdfApi.PostRecognizePdf(source);

Java

// Send PDF for recognition
OCRRecognizePdfBody requestBody = new OCRRecognizePdfBody();
requestBody.setPdf(pdfData);
requestBody.setSettings(settings);
String taskId = api.postRecognizePdf(requestBody);

Android

// Send PDF for recognition
OCRRecognizePdfBody requestBody = new OCRRecognizePdfBody();
requestBody.setPdf(pdfData);
requestBody.setSettings(settings);
String taskId = api.postRecognizePdf(requestBody);

Node.js

// Send PDF for recognition
let requestData = new AsposeOcrCloud10040Api.OCRRecognizePdfBody(fileData.toString('base64'), settings);
api.postRecognizePdf(requestData, (err, res, body) => {
    if (err) {
        console.error(err);
        return;
    }
    const taskId = res;
    // Continue with fetching results
});

Step 6: Fetch and Process Recognition Results

Once you have a task ID, you can fetch and process the recognition results:

.NET

// Fetch recognition result
OCRResponse result = recognizePdfApi.GetRecognizePdf(taskID);

// Process and display the result
if (result.TaskStatus == TaskStatus.Completed) {
    // Access the recognized text
    string recognizedText = Encoding.UTF8.GetString(result.Results[0].Data);
    Console.WriteLine(recognizedText);
    
    // Alternatively, save to a file
    File.WriteAllText("recognized_text.txt", recognizedText);
}

Step 7: Implementing Error Handling and Reliability

In a production environment, you should implement proper error handling and reliability patterns:

.NET

// Error handling and reliability pattern
public string RecognizePdfWithRetry(string pdfPath, int maxRetries = 3, int pollingIntervalMs = 1000)
{
    try {
        // Read PDF file
        byte[] pdfData = File.ReadAllBytes(pdfPath);
        
        // Configure recognition settings
        OCRSettingsRecognizePdf settings = new OCRSettingsRecognizePdf {
            Language = Language.English,
            ResultType = ResultType.Text
        };
        
        // Submit for recognition
        OCRRecognizePdfBody source = new OCRRecognizePdfBody(pdfData, settings);
        string taskID = recognizePdfApi.PostRecognizePdf(source);
        
        // Poll for results with retry logic
        int attempts = 0;
        while (attempts < maxRetries)
        {
            attempts++;
            
            // Wait before polling
            Thread.Sleep(pollingIntervalMs);
            
            try {
                // Fetch result
                OCRResponse result = recognizePdfApi.GetRecognizePdf(taskID);
                
                // Check status
                if (result.TaskStatus == TaskStatus.Completed)
                {
                    // Return recognized text
                    return Encoding.UTF8.GetString(result.Results[0].Data);
                }
                else if (result.TaskStatus == TaskStatus.Error)
                {
                    throw new Exception($"Recognition failed: {result.Error?.Messages[0]}");
                }
                else if (result.TaskStatus == TaskStatus.NotExist)
                {
                    throw new Exception("Task not found. It may have expired.");
                }
                
                // Increase waiting time for subsequent polls (exponential backoff)
                pollingIntervalMs *= 2;
            }
            catch (Exception ex) when (attempts < maxRetries)
            {
                Console.WriteLine($"Attempt {attempts} failed: {ex.Message}. Retrying...");
            }
        }
        
        throw new Exception($"Failed to get recognition results after {maxRetries} attempts.");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error in PDF recognition process: {ex.Message}");
        throw;
    }
}

Troubleshooting Tips

SDK Installation Issues: Make sure you’re using the correct version of the SDK compatible with your programming language and environment.
Authentication Errors: Double-check your Client ID and Client Secret, and ensure they are active in your Aspose Cloud account.
Recognition Quality Issues: Experiment with different recognition settings to improve results for challenging documents.
Memory Limitations: For large PDF files, consider processing them page by page or implementing streaming approaches.

What You’ve Learned

Congratulations! In this tutorial, you’ve learned how to:

Set up and configure the Aspose.OCR Cloud SDK in your project
Authenticate your application with the SDK
Submit PDF documents for recognition using a simplified workflow
Fetch and process recognition results programmatically
Implement error handling and reliability patterns

Next Steps

Now that you’re familiar with using the SDK for PDF recognition, you can explore more advanced topics:

Tutorial: Customizing Recognition Settings

Further Practice

To reinforce what you’ve learned, try these exercises:

Create a complete application that handles multiple PDF files in batch mode
Implement a solution that compares recognition results with different settings
Build a simple user interface that allows users to upload PDFs and view recognition results

Helpful Resources

Have questions about this tutorial? Feel free to post in our support forum for assistance!