The unique and de-dupe values in the custodian de-dupe report are calculated considering only the documents in that single processing batch. When determining if a document is unique, the custodian de-dupe report does not consider earlier processing batches in the case. This behaviour can be confirmed by processing the same set of documents in two separate processing batches. The custodian de-dupe report for each batch will give identical unique and de-dupe counts.
The custodian de-dupe report counts a document as unique only if one copy of that document was processed against that specific custodian in that same processing batch. This determination of uniqueness is therefore at the custodian level, and not just at the batch level. Therefore, if a single processing batch contains source documents for two custodians, and a copy of one document exists in the source directory of each custodian, although there is one only unique document in the batch, the report will display a count of one unique document against each custodian. Because this is calculated at the custodian level, the de-dupe value next to each custodian will be 0% de-dupe in this example; because the document was only processed once against each custodian, it was considered unique to each custodian.
Note: The only time when the custodian de-dupe report is likely to match the processing stats results is when it is the first processing batch against one or more custodians.
The following workflow can be used to determine the new unique documents added to a case from a single processing batch: