How to resolve Checksum Deduplication issues using the 'Update checksum for emails'

book

Article ID: 100039050

calendar_today

Updated On:

Description

Description

Update Checksum for Emails

Background
Deduplication of documents depends on matching checksums. Microsoft Office and Lotus Notes client have altered the way in which they calculate checksums. The result is that from version to version some email documents may not be deduplicated.
For example, Veritas eDiscovery platform upgraded the version of Microsoft Office in release 8.1. For cases with previously indexed data, new emails indexed since version 8.1 may not deduplicate entirely against the emails already in the case.  8.2 CHF4 provides a feature that allows a system manager (or group manager) access to the “Update Checksum for Emails” feature.
IMPORTANT: After upgrading to 8.2 CHF4, the system manager should check to see if there is data needing the checksum update. Go to System > Support Features and choose Update checksum for emails. Only the cases that appear in the Select the case field are affected. The Case Administrator should coordinate the timing of running the Update Checksum for Emails feature against the affected cases.

Who Will Need This Feature
• Users with cases having data indexed in a version of the eDiscovery platform previous to 8.1, and have indexed data after upgrade. Besides the deduplication impact, some users may have experienced retrieval errors after upgrading to 8.1 or better.
• Users who are considering upgrading to 8.1 or better with cases archived in a previous version. If such a case is restored and more data indexed, this feature allows the System or Group administrator to check for and remediate any deduplication impact.

Deduplication Prerequisites
• User must have the System Manager or Group Admin role to access the System page with the support features.
• The Update Checksum for Emails feature is available starting in version 8.2, CHF4. The data to be updated should be on the system with the applicable version of Veritas eDiscovery Program running.


Figure 1
User-added image



Enable the Update checksum for emails feature

Note: Enabling is mandatory and needs to be done once per appliance.

1. Go to System > Support Feature> “Property Editor” (This is an Advanced support feature requiring Technical Support Assistance).

2. Choose the appliance.

3. Check if these properties exist. If not, add them with these values:

esa.support.emailchecksumupdaterfeature.class = com.teneo.esa.processing.deduplicatechecker.EmailChecksumUpdaterSupportFeature

esa.support.emailchecksumupdaterfeature.restricted = false

4. Restart services.
Note:  Repeat steps 1-4 for all appliances in the cluster.


Running Deduplication
1. From the case, go to System > Support Feat ures and choose Update checksum for emails . The cases that appear in the dropdown are those needing to be scanned by this feature.  Click the Submit button to start a job. Select the case and click on the job log to observe progress. Otherwise, wait for the job to complete.

2. When complete, the log will state Job Finished, and a log summary and CSV file will be generated.
Job log – The log shows the steps in progress, number of files deduplicated, whether the update completed successfully, and where the csv file is available.
Log summary— Generated after the job is complete. This log restates the last three lines of the job log: numbers for each type of file to be deduplicated, the number of successfully deduplicated items, and stating the job is complete.
CSV “Finalresult”— A detailed list of updated items identified by DocID, with original and updated crawler checksum IDs.
Note: if items fail to update successfully, the Group admin for the case should investigate the job log for details.

3. Once the problem has been addressed, the case should be scanned using the Update checksum for emails feature again, as in steps 1-3. After the case has been updated completely it will no longer appear in the pull-down.


FAQs and Troubleshooting
Q: After upgrading to 8.2 CHF4, how do I know if there are cases that need to be scanned?
A: Go to System > Support Features and choose Update checksum for emails . The cases that appear in the dropdown are those needing to be scanned.

Q: If I have cases that appear to be affected, when should I use the Update checksum for emails feature?
A: If the case has incoming documents needing indexing, the feature should be run before processing any new case data. The installing System Administrator should check this as in step 1, and consult the Group or Case administrator for priority and timing.

Q: Which files are scanned by this feature?
A: PST,MSG and NSF files. Attachments are not affected by the hash mismatch.

Q: What if the job does not complete successfully?
A: In some cases, files are dropped because they are corrupt, they are missing from the original location, or are open by another process. The Group admin for the case should read the job log to determine the cause and act accordingly, then re-run the feature.

Q: Can I run discovery or processing on a case folder in parallel with the Update checksum for emails feature?
A: Ideally, it is not recommended that any job is run in parallel with the Update checksum utility . However, if this occurs, the Update checksum for emails job will be queued waiting for the discovery/processing job to complete, and vice versa.

Q: Can I review or perform an export on the already processed data while the Update checksum for emails job is in progress?
A: Yes. However, there is a possibility that while items are being reviewed or exported, the same document could be in use by the Update checksum feature . In that scenario there is a possibility of seeing retrieval errors while reviewing a document.  For exports, some documents may fail to export. If the export contain errors caused by the parallel running of the export and the Update checksum for emails , retrying the export will resolve the issue.

Issue/Introduction

How to resolve Checksum Deduplication issues using the 'Update checksum for emails'