Duplicate items in Enterprise Vault (EV) Discovery Accelerator (DA) searches

book

Article ID: 100001528

calendar_today

Updated On:

Resolution

A quick test to determine whether items are duplicates is as follows:

- Select a suspected duplicate item in the Review Set.
- Right-click the item | select 'Copy item details to clipboard'.
- Open Notepad and paste the item information into Notepad.
- Repeat for the second suspected duplicate item and paste the item information below the first item's information in Notepad.
- Find the TransactionID of both items. The TransactionID is the last 32 characters of the SavesetID, per the example below:
SavesetID: 201211253804496~201001071720230000~Z~00C0ECB643A150EB89DA8C5F08BFEE21
TransactionID: 00C0ECB643A150EB89DA8C5F08BFEE21
- Compare the TransactionIDs of both suspected duplicates.

If the TransactionIDs are different, the items are not duplicates. If the TransactionIDs are the same, there could be several scenarios when Enterprise Vault (EV) Discovery Accelerator (DA) either may return the appearance of duplicate items or there may be the expectation of the return of duplicate items in a review set.  Among these scenarios are
  • Scenario 1: One search, one archive
  • Scenario 2: One search, multiple archives
  • Scenario 3: Multiple searches, one archive
  • Scenario 4: Multiple searches, multiple archives

Technical reasons about why this may happen are described in Technical Note 000039650 - "Deduplication of items found in Veritas Enterprise Vault (EV) Compliance Accelerator (CA) or Discovery Accelerator (DA) review sets for export / Production.", available in the Related Articles section at the end of this document.

Scenario 1: One search, one archive

A search run in DA against a single archive may appear to return duplicate items.  This can happen when multiple, similar items are archived in that archive and all of them match the search criteria.  Closer inspection of the items should show that they differ in some small way - for instance, that an email was forwarded multiple times - the time of each archived item would be slightly different.

Scenario 2: One search, multiple archives

A search run in DA against multiple archives can often yield the appearance of duplicate items.  This can happen under the following scenario:

* Enterprise Vault performs Exchange mailbox archiving
* Enterprise Vault performs Exchange journal archiving
1) User A sends a message to User B and User C
2) The Exchange journal receives a copy of the item and Enterprise Vault Exchange journaling archives the item
3) User A archives her sent item
4) User B and User C both let their respective items get archived via Enterprise Vault mailbox archiving, with User C moving the message to a "Projects" folder prior to archiving.
5) A Discovery Accelerator search is performed against:
a. The Exchange journal archive
b. User A's archive
c. User B's archive
d. User C's archive

Since each item was a separate (but related) item within Exchange, Enterprise Vault archives each item separately.  Single instance storage may apply and all items might refer to one common archived item (DVSSP).  Discovery Accelerator looks at the indexes for the archives it is told to inspect and obtains matches to items from each respective index.  If a search is made to just look for a specific subject line, it is possible to retrieve a hit from:
 
1) User A's item archived from User A's "Sent Items" folder
2) User B's item archived from User B's "Inbox" folder
3) User C's item archived from User C's "Projects" folder
4) The Exchange journal archive

Less results may be returned if, for instance, User C deleted the item before it reached the Enterprise Vault mailbox policy that applies to User C's mailbox.

Scenario 3: Multiple searches, one archive

Multiple searches run within DA against just one archive may result in some overlap between the searches.  However, this overlap will not result in duplicate items in the review set IF "Include items already in review" is unchecked on the search.  The items do not yet have to be in the review set, the inspection will be made during acceptance.

For each item that is added to the review set, the search that found that item is recorded for later use in the searches facet.

Example:

"search1" returns 100 results (it is not "Accepted" at this time)
"search2" returns 100 results - 30 of these match the exact same item in "search1"

Both searches are accepted.  As long as these items aren't already in the review set, 170 more items will be added into the review set on this case.  In the Discovery Accelerator 8.0 client, expanding the "Search" facet will show how many items from each search have been added into the review set.  Reviewing an "overlap" item in one search will review the item in the other search (it is the same item).

Scenario 4: Multiple searches, multiple archives

Multiple searches run within DA against multiple archives combines scenarios 2 and 3.  In scenario 3, there is the expectation of duplicates, even though the review set will not contain duplicate items.  In scenario 2, "duplicate" items are found (they are different items within Enterprise Vault, which came from different items within Exchange).  The multiple searches themselves will not produce "duplicates" between them (scenario 3), but the searching of multiple archives may present essentially the same content (User A's message in User B's Inbox, User A's message in User C's Projects folder, etc.) multiple times.
 

 

Issue/Introduction

Duplicate items in Enterprise Vault (EV) Discovery Accelerator (DA) searches