Tutorial: Learn to Send Images for Recognition with Aspose.OCR Cloud API

Learning Objectives

In this tutorial, you’ll learn how to:

Prepare an image for submission to the Aspose.OCR Cloud API
Configure OCR recognition settings for optimal results
Send an API request to initiate the text extraction process
Handle the API response and task ID for further processing

Prerequisites

Before starting this tutorial, you should have:

An Aspose Cloud account with an active subscription or free trial
Your Client ID and Client Secret from the Aspose Cloud Dashboard
Basic understanding of REST APIs and JSON
Access to a REST client like cURL, Postman, or your programming language of choice

Introduction to Image Recognition Process

The Aspose.OCR Cloud API allows you to extract text from various image formats through a simple REST API. The process follows these basic steps:

Get an access token for authorization
Send the image for recognition
Fetch the recognition results using the returned task ID

In this tutorial, we’ll focus on step 2 - sending an image for recognition.

Understanding the Image Submission Endpoint

The Aspose.OCR Cloud API provides an endpoint dedicated to image recognition:

https://api.aspose.cloud/v5.0/ocr/RecognizeImage

This endpoint accepts POST requests with the image and recognition settings in the request body.

Step 1: Prepare Your Image

First, you need to prepare the image you want to process. The API accepts images in Base64 encoded format.

Try it yourself:

Select an image containing text (PNG, JPEG, TIFF, etc.)
Convert it to Base64 using an online converter or a code snippet in your preferred language

Sample Code to Convert an Image to Base64

Here’s how you can convert an image to Base64 in different programming languages:

# Python Example
import base64

with open("sample.png", "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
print(encoded_string)

// Java Example
import java.util.Base64;
import java.nio.file.Files;
import java.nio.file.Paths;

byte[] fileContent = Files.readAllBytes(Paths.get("sample.png"));
String encodedString = Base64.getEncoder().encodeToString(fileContent);
System.out.println(encodedString);

// C# Example
using System;
using System.IO;

byte[] imageArray = File.ReadAllBytes("sample.png");
string base64ImageRepresentation = Convert.ToBase64String(imageArray);
Console.WriteLine(base64ImageRepresentation);

Step 2: Configure Recognition Settings

Next, you’ll need to define your recognition settings in a JSON object. These settings control how the OCR engine processes your image.

Key settings include:

language: Specifies the language of the text (default: “English”)
makeSkewCorrect: Automatically corrects image tilt (default: true)
resultType: Specifies the output format (default: “Text”)

Recognition Settings Reference Table

Setting	Type	Default	Description
`language`	string	“English”	Recognition language
`makeSkewCorrect`	boolean	true	Automatically correct image tilt
`rotate`	integer	0	Manually rotate image (in degrees)
`makeBinarization`	boolean	false	Convert image to black and white
`makeContrastCorrection`	boolean	true	Increase image contrast
`makeUpsampling`	boolean	false	Intelligently upscale image
`makeSpellCheck`	boolean	false	Apply spell checking to results
`dsrMode`	string	“Regions”	Document structure analysis algorithm
`dsrConfidence`	string	“Default”	Threshold for filtering content blocks
`resultType`	string	“Text”	Output format

Step 3: Create Your API Request

Now you’re ready to construct your API request:

cURL Example

curl --request POST --location 'https://api.aspose.cloud/v5.0/ocr/RecognizeImage' \
--header 'Accept: text/plain' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_ACCESS_TOKEN' \
--data-raw '{
  "image": "YOUR_BASE64_ENCODED_IMAGE",
  "settings": {
    "language": "English",
    "makeSkewCorrect": true,
    "makeContrastCorrection": true,
    "resultType": "Text"
  }
}'

Replace YOUR_ACCESS_TOKEN with your actual access token and YOUR_BASE64_ENCODED_IMAGE with your Base64-encoded image.

Expected Response

If successful, the API will return a UUID string which represents your task ID:

a197aade-bba9-4c7a-92c7-46851b3dceaa

This ID is crucial - you’ll need it to fetch your recognition results in the next step of the OCR process.

Note on Evaluation Mode

If you want to try the API without authentication, you can use the evaluation endpoint:

https://api.aspose.cloud/v5.0/ocr/RecognizeImageTrial

This endpoint doesn’t require authentication, but please note that approximately 10% of words in the results will be masked with asterisks.

Troubleshooting Tips

Error 401 (Unauthorized): Check that your access token is valid and correctly formatted in the Authorization header
Error 400 (Bad Request): Verify your JSON payload format and ensure the image is properly Base64 encoded
Command Length Issues: Base64 encoded images can be very long. If using command line tools, you may encounter maximum length limits. Consider using a programming language instead of direct cURL commands for large images

What You’ve Learned

In this tutorial, you’ve learned:

How to prepare and encode an image for OCR processing
How to configure recognition settings for optimal results
How to send an image to the Aspose.OCR Cloud API
How to handle the task ID response for later result retrieval

Next Steps

Now that you’ve successfully sent an image for recognition, proceed to the next tutorial to learn How to Fetch Recognition Results from the Aspose.OCR Cloud API.

Further Practice

To reinforce your learning:

Try sending images with different recognition settings to understand their impact
Compare results between different image types (scanned documents vs. photos)
Experiment with the evaluation endpoint to test without authentication

Helpful Resources

Have questions about this tutorial? Feel free to post on our support forum!