Support and Documentation

Amazon Textract

With the Amazon Textract plugin, you can extract text from PDFs, JPGs, and PNGs. Brightspot associates the extracted text with the files, so editors can then search for and use your files in their own content.

Configuring Amazon Textract

Administrators perform this task.

Procedure. To configure Amazon Textract:
  1. Ensure the following plugins are enabled:

    • Express: DAM

    • Express: Textract

    For details, see Enabling and disabling plugins.

  2. Obtain the following Textract settings from your AWS console:

    • SQS Queue Name

    • Topic ARN

    • Role ARN

  3. From the Navigation Menu, select Sites & Settings.

  4. In the Sites widget, select Global. The Edit Global widget appears.

  5. Configure the interface with AWS Textract by doing the following:

    1. Under Main, expand AWS Textract, and enter the SQS Queue Name, Topic ARN, and Role ARN you determined in step 1.

    2. In the Minimum Block Confidence field, enter confidence values for text within each block. Generally, higher confidence levels provide more accurate results (fewer false positives) but may miss some matches (more false negatives).

  6. Configure the thumbnail generator by doing the following:

    1. Expand DAM Document Data Extraction Settings.

    2. Under Extractor Services, click |mi-add_circle_outline|, and select Textract Document Data Extractor. A form appears.

    3. From the Thumbnail Extractor list, select Pdf Document Data Extractor.

  7. Click Save.

Textract is configured, and editors can view the results of a text extraction in the content edit form.

Extracting text from a PDF or image

You can use Amazon's Textract service for extracting text from PDFs and images you upload to Brightspot. When editors search for words in the extracted text, Brightspot includes in the search results the file you uploaded.

Procedure. To submit a PDF or image to Amazon Textract:
  1. From the Quick Start widget, select Document. A content edit page appears.

  2. From the File list, select New Upload.

  3. Click Choose, and navigate to a file you want to upload to Brightspot. The file must be a PDF, PNG, or JPG.

  4. Complete the content edit form as required.

  5. Complete your site's workflow and publish the item.

  6. Expand Extracted Data, and review the extracted text and thumbnail.

Image submitted to Amazon Textract
Figure 105. Image submitted to Amazon Textract


Result from Amazon Textract
Figure 106. Result from Amazon Textract


When editors search for keywords in the search panel, Brightspot includes the submitted PDF or image in the search results. Referring to the previous example, if an editor searches for gallon in the search panel, Brightspot includes the above image in the search results.

You can modify the thumbnail as necessary. For details, see Preparing an image for publication.