Tutorial: Complete Table Recognition Workflow
Learning Objectives
In this tutorial, you’ll learn how to:
- Build a complete end-to-end table recognition solution
- Handle the entire workflow from authentication to result processing
- Implement best practices for production-ready applications
- Process and transform the recognized data into usable formats
- Create a robust error handling system for OCR operations
Prerequisites
Before starting this tutorial, you should have:
- Completed the previous tutorials in this series
- Client credentials (Client ID and Client Secret) from the Aspose Cloud Dashboard
- A development environment set up for your chosen programming language
- Basic understanding of asynchronous programming concepts
- Sample table images for testing
Introduction
So far in our tutorial series, we’ve covered the individual components of table recognition with Aspose.OCR Cloud. In this advanced tutorial, we’ll bring everything together to build a complete, production-ready workflow for table recognition. This includes authentication, image preprocessing, recognition, polling for results, error handling, and processing the extracted data.
Real-World Scenario
A healthcare provider receives hundreds of lab reports daily in PDF format with tables containing patient test results. The company needs to extract this tabular data and integrate it with their electronic health record (EHR) system. By implementing a complete table recognition workflow, they can automate this process, reduce manual data entry errors, and improve overall efficiency.
Step 1: Design the Workflow Architecture
Before we start coding, let’s outline the complete workflow:
- Authentication: Obtain and manage access tokens
- Image Preparation: Extract or prepare table images
- Recognition Submission: Send tables for OCR processing
- Status Monitoring: Implement polling with appropriate intervals
- Result Retrieval: Fetch and decode recognition results
- Data Processing: Transform extracted data into usable formats
- Error Handling: Implement robust error detection and recovery
- Logging: Track the process for monitoring and debugging
Step 2: Implement the Complete Workflow
Let’s implement this workflow using C# as our example language:
using Aspose.OCR.Cloud.SDK.Api;
using Aspose.OCR.Cloud.SDK.Model;
using System;
using System.IO;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace TableRecognitionWorkflow
{
public class TableRecognizer
{
private readonly string _clientId;
private readonly string _clientSecret;
private readonly RecognizeTableApi _recognizeTableApi;
private readonly ILogger _logger;
// Configuration options with defaults
public class Options
{
public string Language { get; set; } = "English";
public bool MakeSkewCorrect { get; set; } = true;
public bool MakeContrastCorrection { get; set; } = true;
public bool MakeSpellCheck { get; set; } = true;
public string ResultTypeTable { get; set; } = "Csv";
public int PollingIntervalMs { get; set; } = 1000;
public int MaxPollingAttempts { get; set; } = 30;
}
public TableRecognizer(string clientId, string clientSecret, ILogger logger = null)
{
_clientId = clientId;
_clientSecret = clientSecret;
_logger = logger ?? new ConsoleLogger();
// Initialize the API with credentials
_recognizeTableApi = new RecognizeTableApi(clientId, clientSecret);
_logger.Info("TableRecognizer initialized with client credentials");
}
public async Task<string> RecognizeTableAsync(string imagePath, Options options = null)
{
options = options ?? new Options();
try
{
_logger.Info($"Starting table recognition for {imagePath}");
// Step 1: Validate the image file
if (!File.Exists(imagePath))
{
throw new FileNotFoundException($"Table image not found: {imagePath}");
}
// Step 2: Read the image file
_logger.Info("Reading image file");
byte[] tableImage = await File.ReadAllBytesAsync(imagePath);
// Step 3: Configure recognition settings
_logger.Info("Configuring recognition settings");
OCRSettingsRecognizeTable settings = new OCRSettingsRecognizeTable
{
Language = (Language)Enum.Parse(typeof(Language), options.Language, true),
MakeSkewCorrect = options.MakeSkewCorrect,
MakeContrastCorrection = options.MakeContrastCorrection,
MakeSpellCheck = options.MakeSpellCheck,
ResultTypeTable = (ResultTypeTable)Enum.Parse(typeof(ResultTypeTable), options.ResultTypeTable, true)
};
// Step 4: Submit the table for recognition
_logger.Info("Submitting table for recognition");
OCRRecognizeTableBody requestBody = new OCRRecognizeTableBody(tableImage, settings);
string taskId = await Task.Run(() => _recognizeTableApi.PostRecognizeTable(requestBody));
_logger.Info($"Table submitted successfully. Task ID: {taskId}");
// Step 5: Poll for recognition results
_logger.Info("Waiting for recognition to complete");
OCRResponse result = await PollForResultsAsync(taskId, options.PollingIntervalMs, options.MaxPollingAttempts);
// Step 6: Process the recognition results
if (result.TaskStatus == "Completed")
{
_logger.Info("Recognition completed successfully");
string extractedData = Encoding.UTF8.GetString(result.Results[0].Data);
return extractedData;
}
else
{
_logger.Error($"Recognition failed with status: {result.TaskStatus}");
if (result.Error != null && result.Error.Messages != null)
{
foreach (var message in result.Error.Messages)
{
_logger.Error($"Error message: {message}");
}
}
throw new Exception($"Recognition failed with status: {result.TaskStatus}");
}
}
catch (Exception ex)
{
_logger.Error($"Error during table recognition: {ex.Message}");
throw;
}
}
private async Task<OCRResponse> PollForResultsAsync(string taskId, int pollingIntervalMs, int maxAttempts)
{
int attempts = 0;
OCRResponse response;
do
{
attempts++;
_logger.Info($"Polling attempt {attempts}/{maxAttempts}");
// Get current status
response = await Task.Run(() => _recognizeTableApi.GetRecognizeTable(taskId));
_logger.Info($"Current status: {response.TaskStatus}");
// Check if processing is complete
if (response.TaskStatus != "Pending" && response.TaskStatus != "Processing")
{
return response;
}
// Wait before next polling attempt
await Task.Delay(pollingIntervalMs);
} while (attempts < maxAttempts);
throw new TimeoutException($"Recognition timed out after {maxAttempts} polling attempts");
}
}
// Simple logger interface and implementation
public interface ILogger
{
void Info(string message);
void Error(string message);
}
public class ConsoleLogger : ILogger
{
public void Info(string message)
{
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine($"[INFO] {DateTime.Now}: {message}");
Console.ResetColor();
}
public void Error(string message)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine($"[ERROR] {DateTime.Now}: {message}");
Console.ResetColor();
}
}
// Example usage in a console application
class Program
{
static async Task Main(string[] args)
{
try
{
// Get credentials from environment variables or config
string clientId = Environment.GetEnvironmentVariable("ASPOSE_CLIENT_ID") ?? "your-client-id";
string clientSecret = Environment.GetEnvironmentVariable("ASPOSE_CLIENT_SECRET") ?? "your-client-secret";
// Initialize the table recognizer
var recognizer = new TableRecognizer(clientId, clientSecret);
// Configure recognition options
var options = new TableRecognizer.Options
{
MakeSkewCorrect = true,
MakeContrastCorrection = true,
MakeSpellCheck = true,
ResultTypeTable = "Csv",
PollingIntervalMs = 1500,
MaxPollingAttempts = 20
};
// Process the table image
string imagePath = "sample_table.png";
string extractedData = await recognizer.RecognizeTableAsync(imagePath, options);
// Process the extracted CSV data
Console.WriteLine("\nExtracted Table Data:");
Console.WriteLine(extractedData);
// You could further process the CSV data here
// For example, parse it into a data structure or save to a database
}
catch (Exception ex)
{
Console.WriteLine($"An error occurred: {ex.Message}");
}
}
}
}
Step 3: Enhancing the Workflow with Data Processing
Once you’ve extracted the table data, you’ll often need to transform it into a more usable format. Here’s an example of how to parse the CSV data into a structured object:
// Add this to your Program.cs
using System.Collections.Generic;
using System.Linq;
using Microsoft.VisualBasic.FileIO;
// Parse CSV data into a list of dictionaries
public static List<Dictionary<string, string>> ParseCsvData(string csvData)
{
var result = new List<Dictionary<string, string>>();
using (TextFieldParser parser = new TextFieldParser(new StringReader(csvData)))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
// Read header row
string[] headers = parser.ReadFields();
// Read data rows
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
var row = new Dictionary<string, string>();
for (int i = 0; i < headers.Length && i < fields.Length; i++)
{
row[headers[i]] = fields[i];
}
result.Add(row);
}
}
return result;
}
// Example usage:
// var parsedData = ParseCsvData(extractedData);
// foreach (var row in parsedData)
// {
// Console.WriteLine($"Row: {string.Join(", ", row.Select(kv => $"{kv.Key}={kv.Value}"))}");
// }
Step 4: Implementing Retry Logic for Resilience
For production applications, it’s important to implement retry logic to handle transient failures:
// Add this method to your TableRecognizer class
private async Task<T> RetryAsync<T>(Func<Task<T>> operation, int maxRetries = 3, int initialDelayMs = 1000)
{
int retryCount = 0;
int delay = initialDelayMs;
while (true)
{
try
{
return await operation();
}
catch (Exception ex)
{
retryCount++;
if (retryCount >= maxRetries)
{
_logger.Error($"Operation failed after {maxRetries} attempts: {ex.Message}");
throw;
}
_logger.Info($"Operation failed, retrying in {delay}ms (Attempt {retryCount}/{maxRetries})");
await Task.Delay(delay);
// Exponential backoff
delay *= 2;
}
}
}
// Then use it in your operations:
// string taskId = await RetryAsync(() => Task.Run(() => _recognizeTableApi.PostRecognizeTable(requestBody)));
Step 5: Building a Complete Application
Let’s enhance our workflow with a more complete application structure that includes:
- Configuration management
- Multiple file processing
- Result export to various formats
// Add this to Program.cs
static async Task ProcessBatchAsync(TableRecognizer recognizer, string inputFolder, string outputFolder)
{
// Create output directory if it doesn't exist
Directory.CreateDirectory(outputFolder);
// Get all image files
string[] imageExtensions = { "*.jpg", "*.jpeg", "*.png", "*.bmp", "*.tiff", "*.tif" };
var imageFiles = imageExtensions
.SelectMany(ext => Directory.GetFiles(inputFolder, ext))
.ToList();
Console.WriteLine($"Found {imageFiles.Count} images to process");
// Process each image
foreach (var imagePath in imageFiles)
{
try
{
string fileName = Path.GetFileNameWithoutExtension(imagePath);
Console.WriteLine($"Processing {fileName}...");
// Recognize table
string csvData = await recognizer.RecognizeTableAsync(imagePath);
// Save CSV result
string csvPath = Path.Combine(outputFolder, $"{fileName}.csv");
await File.WriteAllTextAsync(csvPath, csvData);
// Also save as JSON for easy integration with web applications
var parsedData = ParseCsvData(csvData);
string jsonData = System.Text.Json.JsonSerializer.Serialize(parsedData,
new System.Text.Json.JsonSerializerOptions { WriteIndented = true });
string jsonPath = Path.Combine(outputFolder, $"{fileName}.json");
await File.WriteAllTextAsync(jsonPath, jsonData);
Console.WriteLine($"Successfully processed {fileName}");
}
catch (Exception ex)
{
Console.WriteLine($"Failed to process {imagePath}: {ex.Message}");
}
}
Console.WriteLine("Batch processing complete!");
}
Learning Checkpoint
Before continuing, make sure you understand:
- How to design a complete workflow for table recognition
- Proper error handling and retry strategies for production applications
- How to transform extracted CSV data into structured objects
- Batch processing multiple table images
Troubleshooting Common Issues
Authentication and Connection Issues
- Implement proper token management with refresh and expiry handling
- Add timeouts and circuit breakers for network operations
- Use proper exception handling to identify connection issues
Recognition Accuracy Problems
- Pre-process images to improve quality before submission
- Experiment with different recognition settings
- For problematic tables, try breaking them into smaller sections
Performance Considerations
- Implement parallel processing for multiple tables
- Optimize polling intervals based on your workload
- Consider implementing a webhook endpoint for asynchronous notifications
What You’ve Learned
In this advanced tutorial, you’ve learned how to:
- Design and implement a complete table recognition workflow
- Build a production-ready application with proper error handling
- Process and transform the recognized data into usable formats
- Implement batch processing for multiple table images
- Apply best practices for resilient cloud-based applications
Next Steps
Now that you’ve mastered the complete table recognition workflow, you can:
- Integrate this functionality into your existing applications
- Build a web service API around this workflow
- Explore advanced data processing and analysis techniques
- Combine with other OCR capabilities for complete document processing
Further Practice
To reinforce what you’ve learned:
- Create a web application that allows users to upload tables for recognition
- Build a microservice that processes tables from a message queue
- Implement a complete pipeline that extracts tables from PDFs, recognizes them, and stores the data in a database
- Develop a dashboard to monitor and manage table recognition jobs
Helpful Resources
Have questions about this tutorial? Feel free to post them on our support forum.