When Enterprise Vault (EV) archives messages through journal archiving or user or shared mailbox archiving, each unique message receives a unique TransactionID assigned by the archiving process. Multiple copies of the same message can be considered unique in themselves under certain conditions. Those conditions include, but are not limited to:
Deduplication of messages occurs with the storage of those messages. Single instance storage (SIS) is part of the process EV uses to store one instance of the archived copy of the message in Digital Vault Store (DVS) or Digital Vault File (DVF) format. When the same message is archived multiple times, as in from different user mailboxes or different journal mailboxes, the recipient information is added as a sharer to the DVS file or its information in the database. Only one copy of the DVS file would exist, but multiple instances of it could be returned in a search.
When Compliance Accelerator (CA) or Discovery Accelerator (DA) run a search, the search will return hits based on the search criteria. If searching against only journal archives, duplicates of a message would be returned as long as EV assigned multiple unique Transaction IDs to the message. If searching user archives, each recipient (and possibly the sender) copy of the message would be returned in the search hits. If searching journal and user archives, copies would be returned for each archived recipient and each unique copy in the journal archive.
CA and DA have the option to export or Produce the search results that have been accepted into the review sets for CA Departments or DA Cases. CA and DA will export all items within a Department or Case, within a specific Search, or within other export or Production run criteria. If duplicates of messages exist in the Department / Case / Search, they will be exported or Produced. There are some possible options to remove or reduce the duplicates either before export / Production or after.
For all of the 9 scenarios presented above, two possible solutions currently exist for obtaining only 1 instance of duplicate items that are considered unique enough that the deduplication processing within CA searches or DA review sets cannot determine they are duplicates.
1) Place a specific Mark in DA, comment in CA or DA, or review status in CA or DA on only one instance of the duplicate messages, and apply the same Mark, comment or review status to all other items that you want to export or Produce, then export or Produce only those items that have that Mark, comment or review status. This is a method to de-duplicate items from within CA or DA.
2) Use 3rd party utilities. Such utilities can scan through PST files to remove duplicate copies of messages. These utilities can be used on PST files prior to ingesting them into EV and then searching in CA or DA, or they can be used on the CA or DA created PST files from an export or Production run. Veritas does not maintain a listing of all such utilities.