Unexpected highlighting in messages with Excel attachments

book

Article ID: 100074690

calendar_today

Updated On:

Description

Error Message

Review the Enterprise Vault Converters Event Log on the EV Servers for the time frame when the messages were Archived for Event ID 28993 and/or Event ID 29132 entries with the following description:

Unable to convert item content

Reason: The converted content is too large to process      (0xc0041bd0)
Supplementary Info:  

Item: SA:-X
Subject: .xlsx
Attachment:  
Type: xlsx
Vault ID: 123ABC456DEF789GHI123ABC456DEF789GHI0000EVSiteAlias
SaveSet ID: 190001011234567~190001010000000000~Z~1234ABCD12AB34CD56EF123456ABCDEF
Attachment ID: X

This item will be archived without a preview being available to the web application and the content will not be indexed. It is not possible to search on the content but the item can be restored as normal

V-437-28993
V-437-29132

 

Cause

Excel files, also known as workbooks, typically contain one or more worksheets, also known as spreadsheets or sheets. Excel files are in an XLSX format, which is an XML-based compressed file. The file size listed for an Excel XLSX file is the compressed file size, not the actual size of the information in the file. The actual data in the file consists of the file's metadata, number of worksheets, user-entered data, formatting, formulas, data sources, data sorting criteria, filters, etc. In reality, an Excel file is considerably larger than it's XLSX size indicates.

For example, a 32 MB workbook, which is fairly large, actually contains approximately 200 MB of actual data; a 75 MB workbook actually contains about 440 MB of data.

Here are two methods to determine the size of the actual data in an Excel XLSX workbook. Review of the resulting files will reveal the actual size of the data contained within the XLSX file.

1. Open the spreadsheet in Excel | File | Save As | Browse | Select the location | Save as type: Single File Web Page (*.mht, *.mhtml) | Save: Entire Workbook | Save.

2. With 7-Zip installed, right-click the spreadsheet | 7-Zip | Extract to "\".


The unexpected highlighting with Excel attachments is due to the following causes:

1. Issues with EV content conversion during archiving when extracting the actual data in the file for Classification. When XLSX files exceed beyond a small file size, typically 10 MB or less, the actual data size can cause issues with content conversion. This may result in Classification not correctly processing the file and therefore not providing the expected Tag Highlighting to CA/Surveillance.

2. Issues with data extraction in the Surveillance UI. When XLSX files exceed beyond a small file size, typically 10 MB or less, the actual data size can cause issues with excessive memory usage during the rendering process. This may result in unexpected and/or incorrect highlighting, and/or unexpected and/or incorrect Tags/Hotwords statistics.

Note that while the behavior is most likely to occur with Excel files larger than 10 MB in size, the behavior has also been seen with Excel files as small as 1 MB in size where such small workbooks had a large number of sheets. The number of sheets can significantly increase the extracted data size.

The issue was first discovered in version 14.5.2, and may be present in other versions.

 

Resolution

There is currently no solution for the content conversion issues or the excessive memory usage during UI rendering. There are limitations within the conversion engine that cannot be overridden or bypassed. The rendering engine's memory usage is currently as optimsed as possible whilst taking into consideration all the various file types that could be displayed in the CA/Surveillance UI. Some Excel XLSX files are simply too large when extracted to render efficiently. As such, the Item Statistics pane may not display the expected hit counts and/or statistics under Tags and/or Hotwords, the Hit Navigation may not be as expected, and the Hotword Hit Highlighting/Classification Tag Highlighting may not display as expected.

For such Excel attachments, it is recommended to download the Excel attachment(s) and review in Excel.

At this time, there are no plans to address this issue through a patch or hotfix in the current or previous versions of the software. Although this issue may potentially be resolved in a future major revision, it is not currently scheduled for any upcoming release. If this issue has a significant business impact on your continued use of the product, please contact your Sales representative or the Sales team to discuss your concerns.

A review is under way to determine if this issue can be detected when the message is selected in the UI so as to display a message with more information.

This issue is presently under investigation by our Engineering Team. Depending on the results of this investigation, it may be addressed through a patch or hotfix in current or future software releases. However, this specific issue is not currently scheduled for inclusion in any upcoming release. If this issue has a significant business impact on your continued use of the product, please contact your Sales representative or the Sales team to discuss your concerns.
 
Note: Customers experiencing this issue are encouraged to contact Veritas Technical Support as data is still being collected to assist in resolving this issue.

 

Issue/Introduction

Unexpected highlighting may be seen in Enterprise Vault (EV) Compliance Accelerator (CA)/Surveillance when messages contain Excel attachments. The messages may contain a small number of Excel files that are large in individual size, or a large number of Excel files that are small in individual size. The unexpected highlighting can manifest itself in the following ways: - The Item Statistics pane may not display the expected hit counts and/or statistics under Tags and/or Hotwords.
- The Hit Navigation may not be as expected.
- Hotword Hit Highlighting does not highlight the expected terms or highlights unexpected terms.
- Classification Tag Highlighting does not highlight the expected Policy Condition values and/or Pattern values. This may be seen for one or more Policies and may vary from message to message, where one message correctly highlights a Policy's Condition values and/or Pattern values, whereas another message with the same Policy applied may not show the expected highlighting. Similarly, a message with multiple Policies applied may display the expected highlighting for one Policy's Condition values and/or Pattern values, but may not display the expected highlighting for another Policy's values.

Additional Information

JIRA: CFT-7252 JIRA: CFT-7253