Update Checksum for Emails
Background
De-duplication of documents depends on matching checksums. Microsoft Office and IBM Notes clients have altered the way in which they calculate checksums. Due to this, from version to version some email documents may not be de-duplicated.
For example, in release 10.1, Veritas eDiscovery Platform upgraded the version of Microsoft Office and IBM Notes. For cases with previously indexed data, new emails indexed since version 10.1 may not de-duplicate entirely against the emails already in the case. Besides the de-duplication impact, some users may experience retrieval errors after upgrading to 10.1 or later.
IMPORTANT: After upgrading to 10.1, a warning message is shown as in figure 1 on selecting those cases that require email checksum update.
Figure 1

System or Group Administrator need to go to System > Support Features and choose Update checksum for emails. Only the cases that appear in the Select the case field are affected. The System Administrator or Group Administrator should co-ordinate the timing of running the Update Checksum for Emails feature against the affected cases.
Who Will Need This Feature
De-duplication Prerequisites
Figure 2

Running De-duplication
FAQs and Troubleshooting
Q: After upgrading to version 10.1, how do I know if there are cases that need to be scanned?
A: From the case, go to System > Support Features and choose Update checksum for emails. Only the cases that need to be scanned by this feature will appear in the drop-down.
Apart from this, a warning message is also shown (as in Figure 1) on selecting the cases that require email checksums to be updated.
Q: If I have cases that appear to be affected, when should I use the Update checksum for emails feature?
A: Preferably the update checksum should be run on the case as soon as the user sees the warning message or at least before processing any new case data or review. The installing System Administrator should check this as in step 1 and consult the Group or Case Administrator for priority and timing.
Q: Which files are scanned by this feature?
A: PST, MSG, and NSF files are scanned by this feature. Attachments, and loose files are not affected by the hash mismatch.
Q: What if the job does not complete successfully?
A: In some cases, files are dropped because they are corrupt, they are missing from the original location, or are open in another process. The Group admin for the case should read the job log to determine the cause and act accordingly, and then re-run the feature.
Q: Can I run discovery or processing on a case folder in parallel with the Update checksum for emails feature?
A: Ideally, we do not recommend running a job in parallel with the Update checksum utility. However, if this occurs, the Update checksum for emails job will be queued waiting for the discovery/processing job to complete, and vice versa.
Q: Can I review or perform an export on the already processed data while the Update checksum for emails job is in progress?
A: No, it is not recommended.
Q: How many Update checksum for emails jobs can I run at a time per case home node?
A: At a time, you can run maximum five jobs per case home node. You can configure the number of concurrent jobs by using a property—esa.checksumupdaterjob.execution.throttle. Default value set for this property is three. You can set the property value from 1-5.