The Optical Character Recognition feature is not finding text in images embedded in PDF or Office documents.

book

Article ID: 100053444

calendar_today

Updated On:

Description

Error Message

No indexed text

Cause

The images are embedded into the document and are not broken out for conversion by OCR.
 

Resolution

1. In the Case, navigate to:  Processing > Settings > Configure processing parameters and feature > Hidden, Inserted and Embedded Content.

2. Depending on the document types to be ingested into the case, check the boxes for:

  • Extract images from Office documents
  • Extract images from Office documents 

3. Then Save the Processing settings.

NOTE: These options must be selected before processing any data into the case. These options are grayed out for cases that have processed data and the settings cannot afterwards be changed.

 

Issue/Introduction

OCR is not finding text in images of image-only PDF or Office documents.