Description
This document provides design and deployment considerations for implementing eDiscovery Platform on VMware.
Author: Kevin Graves
Scope of Document:
This document aims to provide guidance on designing and deploying eDiscovery Platform on the VMware vSphere platform.
This document should be used in conjunction with other performance and best practice guides as outlined in the “Related Documents” section of this document.
Intended Audience:
This document is aimed at system administrators, solutions architects, and consultants.
It is assumed that the reader has a thorough understanding of the architecture and operational aspects of eDiscovery Platform.
It is also assumed that the reader has experience and understanding of VMware vSphere.
Choosing the Right Platform for Your Environment
Virtualization technology has helped many customers introduce cost savings both in terms of lowered data center power consumption and cooling requirements. Virtualization typically also simplifies the datacenter landscape through server consolidation, requiring less hardware to provide the same service to end users with the added benefit of application independent high availability.
Application architectures are rapidly evolving towards highly distributed, loosely-coupled applications. The conventional x86 computing model, in which applications are tightly coupled to physical servers, is too static and restrictive to efficiently support most modern applications. With a virtual deployment, the architecture can be as modular as is appropriate, without expanding the hardware footprint. The dynamic nature of virtual machines mean that the design can grow and adapt as required, without the need for an initial “perfect” design.
Virtual deployments typically take minutes, can share currently deployed hardware, and can be adjusted “on the fly” when more resources are required. Certain server applications however are less suitable for virtualization, especially those requiring heavy use of physical server resources such as CPU and memory.
Traditionally customers have been reluctant to place applications with high service level agreements such as Microsoft Exchange Server and SQL Server on a virtual platform, not only because the application’s demand on resources meant that only one or two virtual machines could co-exist on a single server, but also because the server could not offer the same performance it would have on a physical server.
A number of factors should be considered before deploying eDiscovery Platform in a VMware environment:
- eDiscovery Platform is heavily dependent on CPU and memory resources. In a typical physical server configuration, it is not unusual for the CPU to run at 90% or higher utilization while ingesting data, running an OCR job or exporting is being performed.
- Generally, the more powerful the processor, the better the ingestion and retrieval rates
- The minimum recommendation for CPU and memory configuration for Stand-Alone eDiscovery Platform is 32 CPU cores and 128GB RAM for an eDiscovery Platform server running Collections, Legal Holds and Cluster Master (with no cases).
- If the eDiscovery Platform server will be used as a Worker Node for Pre-processing, Processing, Analysis and Review, the minimum recommended configuration is 24 CPU cores and 96GB RAM.
- It is recommended that CPU and Memory resources are dedicated (reserved) and Locked to the eDiscovery Platform server, and not shared with other virtual machines on the host.
- Other system components such as network and storage need to be sized accordingly to prevent them from becoming a bottleneck
If the above considerations are acceptable and supported by the customer environment, then it is likely that virtualizing the eDiscovery Platform environment will be a good fit for the organization.
Sizing eDiscovery Platform for VMware
One of the most important considerations when sizing eDiscovery Platform is a thorough understanding of the expected workload on each of the eDiscovery Platform servers; with the main consideration being the customer requirements for collecting, processing, reviewing and exporting.
It is outside the scope of this document to provide a design and sizing introduction to eDiscovery Platform, but in general terms, once the customer requirements are understood, a close look at the function of each eDiscovery Platform server will help determine what minimal server resources will be required.
The most common mistake when designing eDiscovery Platform Vault is to size for capacity, as opposed to sizing for performance. The following sections in this guide will provide detail on how to design the various components for optimal configuration.
** MINIMUM REQUIREMENTS **
| FUNCTION |
CPU |
RAM |
| Legal Hold Confirmation Server |
16 |
32GB |
| Legal Hold Server |
16 |
32GB |
| Collections Server |
32 |
64GB |
| Collection, Legal Hold and Confirmation Server |
32 |
64GB |
| Pre-Processing, Analysis, Review and Export Server |
24 |
96GB* |
| All features combined eDP Server |
32 |
128GB* |
| FUNCTION |
CPU |
RAM |
| Cluster Master with MySQL on a separate server |
32 |
128GB* |
| Cluster Master with mySQL |
48 |
128GB* |
| Worker Nodes |
24 |
96GB* |
| Utility Nodes |
4 |
8GB |
* Indicates RAM is Reserved and Locked
Special Considerations:
- Do not combine servers with Reserved and Locked memory with non-Reserved/Locked resource, servers.
- Ensure that the total number of vCPUs assigned to the virtual machines is equal or less than the total number of cores on theESX host
- Do not enable Hyperthreading –in most cases this provides little or no benefit to multi-CPU virtual machines, internal testing have shown that Hyperthreading provides no performance benefit
- All other hardware/software recommendations, follow the Veritas Installation Guide for the correct version of the product.
Throttling: for limited use (For smaller or non-production environments)
There are times when resources are limited, where throttling (lowering active threads) may be also required to obtain the desired numbers. Below are a series of technical articles that can be used as a guide to adjust the outcome of the Performance Monitor Counter results.
How to adjust ASM Memory Components
How to Throttle an eDP Process
Using Performance Monitor Counters to assist is sizing
Considerations:
- The information provided by the Performance Reports is an average, so the resource consumption peaks are higher than the average.
- In order to provide a useful report, the Performance Monitor data collection must be run only during the action of concern (ex: During Ingestion of data or the running of a Collection Task...etc...)
- The Performance Monitor data collection task must be stopped immediately after the action of concern has completed.
Create and execute the Data Collection Set:
- perfmon.msc > Data Collector Sets > User Defined > New > Data Collector Set
- Name: Bottle Necks > Create manually > Next
- Create data logs > Performance counter > Next
- Add (select each of the counters listed below)
- Sample interval: 15 seconds > Next
- Save to an easy folder to access the report.
- Start the 'Bottle Necks' Counter
- Start the new data ingestion
- Stop the 'Bottle Necks' Counter immediately after the completion of data ingestion.
- Open the *.blg file(s) in for analysis.
COUNTERS:
- Memory: Available Bytes
- Memory: Cache Faults /sec
- Memory: Page Faults /sec
- Memory: Page Reads /sec
- Memory: Page Writes /sec
- Memory: Pages/sec
- (select each hard drive - do not select _Total or )
- (select each hard drive - do not select _Total or )
- Physical Disk: Avg. Disk Queue Length
- Physical Disk: Avg. Disk Read Queue Length
- Physical Disk: Avg. Disk Write Queue Length
- Physical Disk: Avg. Disk sec/Read
- Physical Disk: Avg. Disk sec/Write
- Logical Disk: Avg. Disk Queue Length
- Logical Disk: Avg. Disk Read Queue Length
- Logical Disk: Avg. Disk Write Queue Length
- Logical Disk: Avg. Disk Read /sec
- Logical Disk: Avg. Disk Write /sec
- Paging File: % Usage
- (select each individual processor - do not select _Total or )
- Processor: % Processor Time
- (select each cwjava process and OCR if available)
- Process: Page File Bytes
- System: Processor Queue Length
Analyzing the counters
Note: This analysis is only for eDiscovery Platform performance.
MEMORY:
- Available Bytes: (Amount of memory available to the server)
- Cache Faults /sec (This will always be HIGH >2K)
- Page Faults / sec (This will always be HIGH >15K)
- Page Reads/sec: (Should not exceed 15)
- Pages Write/sec: (Should not exceed 80).
- Pages/sec: (An average of 20 pages or less, per second is normal).
LOGICAL and PHYSICAL DISK:
- Avg. Disk Write Queue Length (Should be less than 2)
- Avg. Disk Queue Length (Should not be higher then the number of spindles plus 2)
- Avg. Disk Read Queue Length (Should be less then 2)
- Avg. Disk Write Queue Length (Should be less then 2)
- Avg. Disk Read /sec (Should be under 20ms, if over 50ms indicates a serious bottleneck)
- Avg. Disk Write /sec (Manufacturer dependent)
PAGING FILE
% Usage (Should be below 1.0)
PROCESSOR
% Processor Time (Average between 15% - 20%)
PROCESS
Page File Bytes (First instance is Clearwell, for all other processes, the higher the more efficient >9GB)
SYSTEM
Processor Queue Length (Should not exceed 2 per CPU. Example, if the server contains 16 CPU's, the count should not exceed 32).
>
Memory bottlenecks:
- Page Reads/sec is HIGH: The reason this counter is high, memory page needed by the program is not located in the physical RAM.
Recommendation: Increase the allotted RAM. Reserve and Lock the RAM allocated to the VM.
- Pages/sec is HIGH. This counter tracks the hard page faults (should not exceed 80). Microsoft states: If you have a high rate of page faults combined with a high rate of page reads then you may have an issue where you have insufficient RAM given the high rate of hard faults.
Recommendation: In this case we have both a very high Pages Input per second and a high Page Reads per second.
HDD bottlenecks: (Verify the issue is not a memory bottleneck first)
- Avg. Disk Read Queue Length and Avg. Disk Write Queue Length counts are both above 2. The HDD array is inadequate to handle the workload.
Recommendation: Add more or improved HDDs to the array.
- Avg. Disk Read Queue Length is low, but the Avg. Disk Write Queue Length is high ( >50 ), the Anti-Virus exclusions are not in place.
Recommendation: Apply the appropriate AV exclusions as outlined in the technical article: https://www.veritas.com/support/en_US/article.100013987
CPU bottlenecks:
- % Processor Time and Processor Queue Length are HIGH, the amount of CPU researches is not adequate for the job requested.
Recommendation: Add more CPUs to the environment.
- Processor Queue Length is HIGH with % Processor Time LOW.
Recommendation: Move the MySQL database to a remote server.