Tutorial: Working with the PDF Recognition SDK

Learning Objectives

By the end of this tutorial, you will be able to:

  • Set up and configure the Aspose.OCR Cloud SDK in your project
  • Authenticate your application with the SDK
  • Submit PDF documents for recognition using a simplified workflow
  • Process recognition results programmatically
  • Implement error handling and reliability patterns

Prerequisites

Before starting this tutorial, make sure you have:

  • An Aspose Cloud account with an active subscription or free trial
  • Your Client ID and Client Secret from the Aspose Cloud Dashboard
  • Development environment set up for your chosen programming language:
    • .NET: Visual Studio or Visual Studio Code with .NET Core 3.1+
    • Java: JDK 8+ and Maven/Gradle
    • Android: Android Studio
    • Node.js: Node.js 12+ and npm/yarn
  • Basic knowledge of your chosen programming language
  • A sample scanned PDF document for testing

Understanding the SDK Advantage

While you can directly interact with the Aspose.OCR Cloud REST API, using the SDK provides several benefits:

  • Simplified authentication
  • Type safety and code completion
  • Built-in error handling
  • Streamlined workflow with fewer lines of code
  • Automatic handling of HTTP requests and responses

Step 1: Install the SDK

First, you need to install the SDK for your programming language:

.NET

dotnet add package Aspose.OCR-Cloud

Java

<dependency>
  <groupId>com.aspose</groupId>
  <artifactId>aspose-ocr-cloud</artifactId>
  <version>latest.version</version>
</dependency>

Android

Add to your app’s build.gradle:

dependencies {
  implementation 'com.aspose:aspose-ocr-cloud-android:latest.version'
}

Java

// Fetch recognition result
OCRResponse apiResponse = api.getRecognizePdf(taskId);

// Process and display the result
if (apiResponse.getTaskStatus() == TaskStatus.COMPLETED) {
    // Access the recognized text
    String recognizedText = new String(apiResponse.getResults().get(0).getData(), StandardCharsets.UTF_8);
    System.out.println(recognizedText);
    
    // Alternatively, save to a file
    Files.write(Paths.get("recognized_text.txt"), recognizedText.getBytes());
}

Android

// Fetch recognition result
OCRResponse apiResponse = api.getRecognizePdf(taskId);

// Process and display the result
if (apiResponse.getTaskStatus() == TaskStatus.COMPLETED) {
    // Access the recognized text
    String recognizedText = new String(apiResponse.getResults().get(0).getData(), StandardCharsets.UTF_8);
    System.out.println(recognizedText);
    
    // Alternatively, save to a file or display in your app
    TextView resultTextView = findViewById(R.id.resultTextView);
    resultTextView.setText(recognizedText);
}

Node.js

// Fetching recognition result
function callGetPdfFunction(id){
    return new Promise(function(resolve, reject){
        api.getRecognizePdf(id, (err, res, body) => {
            if (err) {
                reject(err);
            }
            resolve(body);
        })
    })
}

// Processing and displaying the result
function processResult(body){
    console.log('Processing results...')
    const json_res = JSON.parse(body['text']);
    
    // Check task status
    if (json_res['taskStatus'] === 'Completed') {
        // Access the recognized text
        const recognizedText = atob(json_res['results'][0]['data']);
        console.log(recognizedText);
        
        // Alternatively, save to a file
        fs.writeFileSync('recognized_text.txt', recognizedText);
    }
}

// Recognition flow
connect().then(
    access_token => callPostPdfRecognizeFunction(access_token, "source.pdf")
).then(
    x => new Promise(resolve => setTimeout(() => resolve(x), 1000))
).then(
    id => callGetPdfFunction(id)
).then(
    body => processResult(body)
)

Try it yourself:

Implement the complete recognition flow in your chosen programming language, from authentication to displaying the results.

Node.js

npm install aspose_ocr_cloud_5_0_api --save

Try it yourself:

Install the SDK for your preferred programming language and verify that it’s correctly integrated with your project.

Step 2: Configure Authentication

Every SDK interaction begins with authentication using your Client ID and Client Secret:

.NET

using Aspose.OCR.Cloud.SDK.Api;
using Aspose.OCR.Cloud.SDK.Model;

// Initialize the API with your credentials
RecognizePdfApi recognizePdfApi = new RecognizePdfApi("<Client Id>", "<Client Secret>");

Java

import Aspose.OCR.Cloud.SDK.RecognizePdfApi;
import Aspose.OCR.Cloud.SDK.model.*;

// Initialize the API with your credentials
RecognizePdfApi api = new RecognizePdfApi("<Client Id>", "<Client Secret>");

Android

// Initialize the API with your credentials
RecognizePdfApi api = new RecognizePdfApi("<Client Id>", "<Client Secret>");

Node.js

const AsposeOcrCloud10040Api = require('aspose_ocr_cloud_5_0_api');
const request = require('aspose_ocr_cloud_5_0_api/node_modules/request');

// Getting access token
function connect(){
    return new Promise(function(resolve, reject){
        request.post({
            headers: {"ContentType": "application/x-www-form-urlencoded", "Accept": "application/json;charset=UTF-8"},
            url: "https://api.aspose.cloud/connect/token",
            form: JSON.parse('{"client_id": "<Client Id>", "client_secret": "<Client Secret>", "grant_type": "client_credentials"}')
        }, (err, res, body) => {
            if (err) {
                reject(err);
            }
            resolve(body);
        });
    });
}

Learning Checkpoint:

What are the benefits of using the SDK over directly calling the REST API? List at least three.

AnswerBenefits include simplified authentication, type safety and code completion, built-in error handling, streamlined workflow with fewer lines of code, and automatic handling of HTTP requests and responses.

Step 3: Prepare the PDF Document

Next, you need to load your PDF document into memory:

.NET

// Read source PDF document to array of bytes
byte[] pdfData = File.ReadAllBytes("source.pdf");

Java

// Read source PDF document to array of bytes
byte[] pdfData = Files.readAllBytes(Path.of("source.pdf"));

Android

// Read source PDF document to array of bytes
String pdfFileName = "source.pdf";
InputStream inputStream = context.getAssets().open(pdfFileName);
int size = inputStream.available();
byte[] pdfData = new byte[size];
inputStream.read(pdfData);
inputStream.close();

Node.js

const path = require("path");
const fs = require("fs");

// Read source PDF document
var filePath = path.normalize("source.pdf");
var buffer = Buffer.alloc(1024 * 50);
var fileData = fs.readFileSync(filePath, buffer);

Step 4: Configure Recognition Settings

Set up the recognition parameters according to your document’s characteristics:

.NET

// Specify recognition settings
OCRSettingsRecognizePdf recognitionSettings = new OCRSettingsRecognizePdf {
    Language = Language.English,
    ResultType = ResultType.Text,
    MakeSpellCheck = true,
    MakeContrastCorrection = true
};

Java

// Specify recognition settings
OCRSettingsRecognizePdf settings = new OCRSettingsRecognizePdf();
settings.setLanguage(Language.ENGLISH);
settings.setResultType(ResultType.TEXT);
settings.setMakeSpellCheck(true);
settings.setMakeContrastCorrection(true);

Android

// Specify recognition settings
OCRSettingsRecognizePdf settings = new OCRSettingsRecognizePdf();
settings.setLanguage(Language.ENGLISH);
settings.setResultType(ResultType.TEXT);
settings.setMakeSpellCheck(true);
settings.setMakeContrastCorrection(true);

Node.js

// Specify recognition settings
let settings = new AsposeOcrCloud10040Api.OCRSettingsRecognizePdf();
settings.Language = "English";
settings.ResultType = "Text";
settings.MakeSpellCheck = true;
settings.MakeContrastCorrection = true;

Try it yourself:

Experiment with different recognition settings based on your document’s characteristics. For example, if your document contains small text, set MakeUpsampling to true.

Step 5: Submit the PDF for Recognition

Now, create a request and submit your PDF for recognition:

.NET

// Send PDF for recognition
OCRRecognizePdfBody source = new OCRRecognizePdfBody(pdfData, recognitionSettings);
string taskID = recognizePdfApi.PostRecognizePdf(source);

Java

// Send PDF for recognition
OCRRecognizePdfBody requestBody = new OCRRecognizePdfBody();
requestBody.setPdf(pdfData);
requestBody.setSettings(settings);
String taskId = api.postRecognizePdf(requestBody);

Android

// Send PDF for recognition
OCRRecognizePdfBody requestBody = new OCRRecognizePdfBody();
requestBody.setPdf(pdfData);
requestBody.setSettings(settings);
String taskId = api.postRecognizePdf(requestBody);

Node.js

// Send PDF for recognition
let requestData = new AsposeOcrCloud10040Api.OCRRecognizePdfBody(fileData.toString('base64'), settings);
api.postRecognizePdf(requestData, (err, res, body) => {
    if (err) {
        console.error(err);
        return;
    }
    const taskId = res;
    // Continue with fetching results
});

Step 6: Fetch and Process Recognition Results

Once you have a task ID, you can fetch and process the recognition results:

.NET

// Fetch recognition result
OCRResponse result = recognizePdfApi.GetRecognizePdf(taskID);

// Process and display the result
if (result.TaskStatus == TaskStatus.Completed) {
    // Access the recognized text
    string recognizedText = Encoding.UTF8.GetString(result.Results[0].Data);
    Console.WriteLine(recognizedText);
    
    // Alternatively, save to a file
    File.WriteAllText("recognized_text.txt", recognizedText);
}

Step 7: Implementing Error Handling and Reliability

In a production environment, you should implement proper error handling and reliability patterns:

.NET

// Error handling and reliability pattern
public string RecognizePdfWithRetry(string pdfPath, int maxRetries = 3, int pollingIntervalMs = 1000)
{
    try {
        // Read PDF file
        byte[] pdfData = File.ReadAllBytes(pdfPath);
        
        // Configure recognition settings
        OCRSettingsRecognizePdf settings = new OCRSettingsRecognizePdf {
            Language = Language.English,
            ResultType = ResultType.Text
        };
        
        // Submit for recognition
        OCRRecognizePdfBody source = new OCRRecognizePdfBody(pdfData, settings);
        string taskID = recognizePdfApi.PostRecognizePdf(source);
        
        // Poll for results with retry logic
        int attempts = 0;
        while (attempts < maxRetries)
        {
            attempts++;
            
            // Wait before polling
            Thread.Sleep(pollingIntervalMs);
            
            try {
                // Fetch result
                OCRResponse result = recognizePdfApi.GetRecognizePdf(taskID);
                
                // Check status
                if (result.TaskStatus == TaskStatus.Completed)
                {
                    // Return recognized text
                    return Encoding.UTF8.GetString(result.Results[0].Data);
                }
                else if (result.TaskStatus == TaskStatus.Error)
                {
                    throw new Exception($"Recognition failed: {result.Error?.Messages[0]}");
                }
                else if (result.TaskStatus == TaskStatus.NotExist)
                {
                    throw new Exception("Task not found. It may have expired.");
                }
                
                // Increase waiting time for subsequent polls (exponential backoff)
                pollingIntervalMs *= 2;
            }
            catch (Exception ex) when (attempts < maxRetries)
            {
                Console.WriteLine($"Attempt {attempts} failed: {ex.Message}. Retrying...");
            }
        }
        
        throw new Exception($"Failed to get recognition results after {maxRetries} attempts.");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error in PDF recognition process: {ex.Message}");
        throw;
    }
}

Troubleshooting Tips

  • SDK Installation Issues: Make sure you’re using the correct version of the SDK compatible with your programming language and environment.
  • Authentication Errors: Double-check your Client ID and Client Secret, and ensure they are active in your Aspose Cloud account.
  • Recognition Quality Issues: Experiment with different recognition settings to improve results for challenging documents.
  • Memory Limitations: For large PDF files, consider processing them page by page or implementing streaming approaches.

What You’ve Learned

Congratulations! In this tutorial, you’ve learned how to:

  • Set up and configure the Aspose.OCR Cloud SDK in your project
  • Authenticate your application with the SDK
  • Submit PDF documents for recognition using a simplified workflow
  • Fetch and process recognition results programmatically
  • Implement error handling and reliability patterns

Next Steps

Now that you’re familiar with using the SDK for PDF recognition, you can explore more advanced topics:

Tutorial: Customizing Recognition Settings

Further Practice

To reinforce what you’ve learned, try these exercises:

  1. Create a complete application that handles multiple PDF files in batch mode
  2. Implement a solution that compares recognition results with different settings
  3. Build a simple user interface that allows users to upload PDFs and view recognition results

Helpful Resources

Have questions about this tutorial? Feel free to post in our support forum for assistance!