Tutorial: Working with the PDF Recognition SDK
Learning Objectives
By the end of this tutorial, you will be able to:
- Set up and configure the Aspose.OCR Cloud SDK in your project
- Authenticate your application with the SDK
- Submit PDF documents for recognition using a simplified workflow
- Process recognition results programmatically
- Implement error handling and reliability patterns
Prerequisites
Before starting this tutorial, make sure you have:
- An Aspose Cloud account with an active subscription or free trial
- Your Client ID and Client Secret from the Aspose Cloud Dashboard
- Development environment set up for your chosen programming language:
- .NET: Visual Studio or Visual Studio Code with .NET Core 3.1+
- Java: JDK 8+ and Maven/Gradle
- Android: Android Studio
- Node.js: Node.js 12+ and npm/yarn
- Basic knowledge of your chosen programming language
- A sample scanned PDF document for testing
Understanding the SDK Advantage
While you can directly interact with the Aspose.OCR Cloud REST API, using the SDK provides several benefits:
- Simplified authentication
- Type safety and code completion
- Built-in error handling
- Streamlined workflow with fewer lines of code
- Automatic handling of HTTP requests and responses
Step 1: Install the SDK
First, you need to install the SDK for your programming language:
.NET
dotnet add package Aspose.OCR-Cloud
Java
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-ocr-cloud</artifactId>
<version>latest.version</version>
</dependency>
Android
Add to your app’s build.gradle
:
dependencies {
implementation 'com.aspose:aspose-ocr-cloud-android:latest.version'
}
Java
// Fetch recognition result
OCRResponse apiResponse = api.getRecognizePdf(taskId);
// Process and display the result
if (apiResponse.getTaskStatus() == TaskStatus.COMPLETED) {
// Access the recognized text
String recognizedText = new String(apiResponse.getResults().get(0).getData(), StandardCharsets.UTF_8);
System.out.println(recognizedText);
// Alternatively, save to a file
Files.write(Paths.get("recognized_text.txt"), recognizedText.getBytes());
}
Android
// Fetch recognition result
OCRResponse apiResponse = api.getRecognizePdf(taskId);
// Process and display the result
if (apiResponse.getTaskStatus() == TaskStatus.COMPLETED) {
// Access the recognized text
String recognizedText = new String(apiResponse.getResults().get(0).getData(), StandardCharsets.UTF_8);
System.out.println(recognizedText);
// Alternatively, save to a file or display in your app
TextView resultTextView = findViewById(R.id.resultTextView);
resultTextView.setText(recognizedText);
}
Node.js
// Fetching recognition result
function callGetPdfFunction(id){
return new Promise(function(resolve, reject){
api.getRecognizePdf(id, (err, res, body) => {
if (err) {
reject(err);
}
resolve(body);
})
})
}
// Processing and displaying the result
function processResult(body){
console.log('Processing results...')
const json_res = JSON.parse(body['text']);
// Check task status
if (json_res['taskStatus'] === 'Completed') {
// Access the recognized text
const recognizedText = atob(json_res['results'][0]['data']);
console.log(recognizedText);
// Alternatively, save to a file
fs.writeFileSync('recognized_text.txt', recognizedText);
}
}
// Recognition flow
connect().then(
access_token => callPostPdfRecognizeFunction(access_token, "source.pdf")
).then(
x => new Promise(resolve => setTimeout(() => resolve(x), 1000))
).then(
id => callGetPdfFunction(id)
).then(
body => processResult(body)
)
Try it yourself:
Implement the complete recognition flow in your chosen programming language, from authentication to displaying the results.
Node.js
npm install aspose_ocr_cloud_5_0_api --save
Try it yourself:
Install the SDK for your preferred programming language and verify that it’s correctly integrated with your project.
Step 2: Configure Authentication
Every SDK interaction begins with authentication using your Client ID and Client Secret:
.NET
using Aspose.OCR.Cloud.SDK.Api;
using Aspose.OCR.Cloud.SDK.Model;
// Initialize the API with your credentials
RecognizePdfApi recognizePdfApi = new RecognizePdfApi("<Client Id>", "<Client Secret>");
Java
import Aspose.OCR.Cloud.SDK.RecognizePdfApi;
import Aspose.OCR.Cloud.SDK.model.*;
// Initialize the API with your credentials
RecognizePdfApi api = new RecognizePdfApi("<Client Id>", "<Client Secret>");
Android
// Initialize the API with your credentials
RecognizePdfApi api = new RecognizePdfApi("<Client Id>", "<Client Secret>");
Node.js
const AsposeOcrCloud10040Api = require('aspose_ocr_cloud_5_0_api');
const request = require('aspose_ocr_cloud_5_0_api/node_modules/request');
// Getting access token
function connect(){
return new Promise(function(resolve, reject){
request.post({
headers: {"ContentType": "application/x-www-form-urlencoded", "Accept": "application/json;charset=UTF-8"},
url: "https://api.aspose.cloud/connect/token",
form: JSON.parse('{"client_id": "<Client Id>", "client_secret": "<Client Secret>", "grant_type": "client_credentials"}')
}, (err, res, body) => {
if (err) {
reject(err);
}
resolve(body);
});
});
}
Learning Checkpoint:
What are the benefits of using the SDK over directly calling the REST API? List at least three.
Answer
Benefits include simplified authentication, type safety and code completion, built-in error handling, streamlined workflow with fewer lines of code, and automatic handling of HTTP requests and responses.Step 3: Prepare the PDF Document
Next, you need to load your PDF document into memory:
.NET
// Read source PDF document to array of bytes
byte[] pdfData = File.ReadAllBytes("source.pdf");
Java
// Read source PDF document to array of bytes
byte[] pdfData = Files.readAllBytes(Path.of("source.pdf"));
Android
// Read source PDF document to array of bytes
String pdfFileName = "source.pdf";
InputStream inputStream = context.getAssets().open(pdfFileName);
int size = inputStream.available();
byte[] pdfData = new byte[size];
inputStream.read(pdfData);
inputStream.close();
Node.js
const path = require("path");
const fs = require("fs");
// Read source PDF document
var filePath = path.normalize("source.pdf");
var buffer = Buffer.alloc(1024 * 50);
var fileData = fs.readFileSync(filePath, buffer);
Step 4: Configure Recognition Settings
Set up the recognition parameters according to your document’s characteristics:
.NET
// Specify recognition settings
OCRSettingsRecognizePdf recognitionSettings = new OCRSettingsRecognizePdf {
Language = Language.English,
ResultType = ResultType.Text,
MakeSpellCheck = true,
MakeContrastCorrection = true
};
Java
// Specify recognition settings
OCRSettingsRecognizePdf settings = new OCRSettingsRecognizePdf();
settings.setLanguage(Language.ENGLISH);
settings.setResultType(ResultType.TEXT);
settings.setMakeSpellCheck(true);
settings.setMakeContrastCorrection(true);
Android
// Specify recognition settings
OCRSettingsRecognizePdf settings = new OCRSettingsRecognizePdf();
settings.setLanguage(Language.ENGLISH);
settings.setResultType(ResultType.TEXT);
settings.setMakeSpellCheck(true);
settings.setMakeContrastCorrection(true);
Node.js
// Specify recognition settings
let settings = new AsposeOcrCloud10040Api.OCRSettingsRecognizePdf();
settings.Language = "English";
settings.ResultType = "Text";
settings.MakeSpellCheck = true;
settings.MakeContrastCorrection = true;
Try it yourself:
Experiment with different recognition settings based on your document’s characteristics. For example, if your document contains small text, set MakeUpsampling
to true
.
Step 5: Submit the PDF for Recognition
Now, create a request and submit your PDF for recognition:
.NET
// Send PDF for recognition
OCRRecognizePdfBody source = new OCRRecognizePdfBody(pdfData, recognitionSettings);
string taskID = recognizePdfApi.PostRecognizePdf(source);
Java
// Send PDF for recognition
OCRRecognizePdfBody requestBody = new OCRRecognizePdfBody();
requestBody.setPdf(pdfData);
requestBody.setSettings(settings);
String taskId = api.postRecognizePdf(requestBody);
Android
// Send PDF for recognition
OCRRecognizePdfBody requestBody = new OCRRecognizePdfBody();
requestBody.setPdf(pdfData);
requestBody.setSettings(settings);
String taskId = api.postRecognizePdf(requestBody);
Node.js
// Send PDF for recognition
let requestData = new AsposeOcrCloud10040Api.OCRRecognizePdfBody(fileData.toString('base64'), settings);
api.postRecognizePdf(requestData, (err, res, body) => {
if (err) {
console.error(err);
return;
}
const taskId = res;
// Continue with fetching results
});
Step 6: Fetch and Process Recognition Results
Once you have a task ID, you can fetch and process the recognition results:
.NET
// Fetch recognition result
OCRResponse result = recognizePdfApi.GetRecognizePdf(taskID);
// Process and display the result
if (result.TaskStatus == TaskStatus.Completed) {
// Access the recognized text
string recognizedText = Encoding.UTF8.GetString(result.Results[0].Data);
Console.WriteLine(recognizedText);
// Alternatively, save to a file
File.WriteAllText("recognized_text.txt", recognizedText);
}
Step 7: Implementing Error Handling and Reliability
In a production environment, you should implement proper error handling and reliability patterns:
.NET
// Error handling and reliability pattern
public string RecognizePdfWithRetry(string pdfPath, int maxRetries = 3, int pollingIntervalMs = 1000)
{
try {
// Read PDF file
byte[] pdfData = File.ReadAllBytes(pdfPath);
// Configure recognition settings
OCRSettingsRecognizePdf settings = new OCRSettingsRecognizePdf {
Language = Language.English,
ResultType = ResultType.Text
};
// Submit for recognition
OCRRecognizePdfBody source = new OCRRecognizePdfBody(pdfData, settings);
string taskID = recognizePdfApi.PostRecognizePdf(source);
// Poll for results with retry logic
int attempts = 0;
while (attempts < maxRetries)
{
attempts++;
// Wait before polling
Thread.Sleep(pollingIntervalMs);
try {
// Fetch result
OCRResponse result = recognizePdfApi.GetRecognizePdf(taskID);
// Check status
if (result.TaskStatus == TaskStatus.Completed)
{
// Return recognized text
return Encoding.UTF8.GetString(result.Results[0].Data);
}
else if (result.TaskStatus == TaskStatus.Error)
{
throw new Exception($"Recognition failed: {result.Error?.Messages[0]}");
}
else if (result.TaskStatus == TaskStatus.NotExist)
{
throw new Exception("Task not found. It may have expired.");
}
// Increase waiting time for subsequent polls (exponential backoff)
pollingIntervalMs *= 2;
}
catch (Exception ex) when (attempts < maxRetries)
{
Console.WriteLine($"Attempt {attempts} failed: {ex.Message}. Retrying...");
}
}
throw new Exception($"Failed to get recognition results after {maxRetries} attempts.");
}
catch (Exception ex)
{
Console.WriteLine($"Error in PDF recognition process: {ex.Message}");
throw;
}
}
Troubleshooting Tips
- SDK Installation Issues: Make sure you’re using the correct version of the SDK compatible with your programming language and environment.
- Authentication Errors: Double-check your Client ID and Client Secret, and ensure they are active in your Aspose Cloud account.
- Recognition Quality Issues: Experiment with different recognition settings to improve results for challenging documents.
- Memory Limitations: For large PDF files, consider processing them page by page or implementing streaming approaches.
What You’ve Learned
Congratulations! In this tutorial, you’ve learned how to:
- Set up and configure the Aspose.OCR Cloud SDK in your project
- Authenticate your application with the SDK
- Submit PDF documents for recognition using a simplified workflow
- Fetch and process recognition results programmatically
- Implement error handling and reliability patterns
Next Steps
Now that you’re familiar with using the SDK for PDF recognition, you can explore more advanced topics:
Tutorial: Customizing Recognition Settings
Further Practice
To reinforce what you’ve learned, try these exercises:
- Create a complete application that handles multiple PDF files in batch mode
- Implement a solution that compares recognition results with different settings
- Build a simple user interface that allows users to upload PDFs and view recognition results
Helpful Resources
Have questions about this tutorial? Feel free to post in our support forum for assistance!