Discovery Accelerator exports return similar items even though 'Exclude similar items' is checked

book

Article ID: 100046769

calendar_today

Updated On:

Cause

A 'Hash Id' is generated at the time of search which is calculated on the subset of each item’s metadata, for the metadata analysis to identify similar items. One of the elements of this subset is 'Content', for which EV truncates the content data to 120 characters in the metadata of the item. It is possible that the items have the first 120 characters which are similar but the rest of the content different.

So, even if the items are similar (as per the Hash Id) a reviewer may find that they are not similar and hence mark one item as Relevant and the other as Not-Relevant. (or other marks as applicable)

Ref: https://www.veritas.com/content/support/en_US/doc/ev_all_accelerator_deduplication_00 : Accelerator Deduplication Whitepaper

Resolution

This is by design as we consider marking on items while exporting them since we deem the analysis of items by legal auditors as superior. Hence, we will have differently marked similar items treated as distinguished in such scenarios and available in the exports accordingly.

Note: This situation is also valid for 'Exclude duplicate items' considering that the content and metadata analysis can return false positives as discussed in the Accelerator Deduplication Whitepaper above to identify duplicate items.

Issue/Introduction

Discovery Accelerator (DA) exports have similar items seen in the export even though 'Exclude similar items' is checked while exporting.