Why do the processing statistics for a batch not match the "De-duplication by Custodian" report?

book

Article ID: 100015898

calendar_today

Updated On:

Description

Error Message

No error message is displayed.

Cause

The unique and de-dupe values in the custodian de-dupe report are calculated considering only the documents in that single processing batch. When determining if a document is unique, the custodian de-dupe report does not consider earlier processing batches in the case. This behaviour can be confirmed by processing the same set of documents in two separate processing batches. The custodian de-dupe report for each batch will give identical unique and de-dupe counts.

The custodian de-dupe report counts a document as unique only if one copy of that document was processed against that specific custodian in that same processing batch. This determination of uniqueness is therefore at the custodian level, and not just at the batch level. Therefore, if a single processing batch contains source documents for two custodians, and a copy of one document exists in the source directory of each custodian, although there is one only unique document in the batch, the report will display a count of one unique document against each custodian. Because this is calculated at the custodian level, the de-dupe value next to each custodian will be 0% de-dupe in this example; because the document was only processed once against each custodian, it was considered unique to each custodian.

Note: The only time when the custodian de-dupe report is likely to match the processing stats results is when it is the first processing batch against one or more custodians.
 

Resolution

The following workflow can be used to determine the new unique documents added to a case from a single processing batch:

  1. In "Identifiers" section of Advanced Search, select "Any of these processing batches" and select all of the processing batches up to, but not including, the processing batch of interest. Save the search results in a folder, e.g. "before_batch_X".
     
  2. In Advanced Search, click "Clear" to reset the search options. In section "Scope", select the new folder "before_batch_X" in "None of these folders". Then, in the "Identifiers" section, select "Any of these processing batches" and select only the batch of interest (i.e. batch X). Run the search, and this will show the documents newly added in batch X. The search result can then be further filtered by custodian (and document type, if required). The "Report" page in "Analysis & Review" can be exported, and will include the custodian unique document counts.

 

Issue/Introduction

After processing a new set of documents for a single custodian, the "Unique Documents" and "De-duplication Percentage" values in the "De-duplication by Custodian" processing report ("Processing > Reports") do not match the "Unique documents indexed" and "Deduplication %" values on the "Processing Statistics" tab of the "Processing > Processing Status" page.