Components of a Load File
Load Files can seem intimidating at first but they are less intimidating than they appear. When you are importing a load file you are essentially working with a flat text file that has data in it arranged into columns and rows.
The Load File Import (LFI) Tool was introduced officially in eDiscovery version 7.0 and supported the CSV and DAT formats. The only difference between these formats is that the former uses commas for field separators and DAT files use any other character for a separator. A field separator is used to separate fields.
An important thing to know very early on when working with the load file is what type of encoding it is in. This can change. If the right encoding is selected then we will be able to see the characters in the preview screen of the LFI tool.
This is what a line from a load file might look like. I say might because some characters might be different but the overall concept remains the same, fields of data separated by specific characters:
þ000000003A214396F78F5E4B92DB85A8D15EFBBCE4E72000.#1.ENRON TEAM ROSTER.DOCþ®þMiddleton, Brandon
þ®þDavis, Buster þ®þþ®þþ®þ12312008þ®þ21:11þ®þ0.7.14.27231þ®þMessageþ®þþ®þþ®þþ®þþ®þþ®þþ®þAiken, Sam þ®þtrueþ®þþ®þapplication/vnd.ms-outlookþ®þþ®þþ®þþ®þþ®þþ®þ0.7.14.27231þ®þ0.7.14.27231þ®þ Check this out þ®þþ®þþ®þþ®þþ®þþ®þJack Fredericks þ®þ1þ
If we look closely at this line we can see the following:
The "þ" is a text qualifier - this tells eDiscovery that there is a string of text in between each two of these symbols and that it should not be parsed.
The "®" is the field separator - we mentioned this earlier. It tells eDiscovery that we have finished one field and are starting another.
The "þþ" is an empty text qualifier field - that means that this particular line has nothing for this field.
Sometimes a load file has a header line and this can help match up what each column is for. But this can be confusing when looking at the raw text file.
As mentioned earlier, these characters can vary from load file to load file, so how does eDiscovery know which character is a separator and which is not? We manually specify this in the Load File Import Tool.
Top tips for working with load files, especially on production systems, is to test the load file in a test case before hand so that become familiar with the fields in it, the separators and delimiters, and how it will look like. As the number of configurable options in load files is very wide, it is important that you have the chance to understand the files you've been provided, and these can vary from one customer to the other. Mistakes can happen and proper preparation is vital to prevent more serious problems later down the review workflow.
Load File Import Tool
The first thing you will notice when you add a load file is that, provided you've selected the correct Top Level Source Directory:
You should be told the number of load files found and asked to select one:
The drop down box will allow you to select a different one. A top tip is to make sure you've got the right one or, even better, have separate directories for different load files. This will save you making any mistakes later where you put in all the work of mapping fields or, worse, processing the wrong load file with the wrong mapping criteria.
This part of the LFI tool is where you tell eDiscovery which character stands for what separator or delimiter:
Notice that you can also select the file encoding for the load file, and hopefully if your customer has already told you what that is then this will be easy. If not, you can use the preview screen below to test the different options. When the encoding is the wrong type, you will see garbled text like this:
In this example, changing the encoding to UTF-8 instead of ISO-8859-1 will give us this:
But we are still not done yet. Here is what the preview screen looks like after we have selected the correct separators and delimiters:
Apart from the field and text delimiters there are also multi-value delimiters and hierarchy delimiters. Multi-value delimiters are used for emails, ostensibly because they can have multiple values with while nested-values are used by certain third-party load files to indicated nested values such as levels within a tagging structure. Hierarchy delimiters are used to tell eDiscovery what the path is to text or native files. When these are all specified the preview screen gives us a usable representation of the load file. It is worth noting that escape characters might be crucial for certain load files to import successfully. Refer to the Load File Import Guide for additional information on each of these character elements.
It is important to note that the preview screen does not pre-select anything. Because load files are so diverse and configurable it would be very bad practice to assume that any column means anything. You are given complete control over what fields to map to what column.
There are still different tabs that will help us match the columns in the load file with columns in eDiscovery. Most of the standard columns are already available for you to match, but if the file includes custom fields that you require for your own eDiscovery purposes, then the Custom Fields tab is where you match those fields last:
There are a lot of ways that documents can be related. Document relationships can therefore take one the following formats:
1. Child
2. Parent ID
3. Range
4. Family Range
5. No relationship
For Identifiers documents also have prefix and suffix sections and this something the customer has to provide us. Sometimes the customers want to search documents in a particular way and the prefix helps to do that.
In eDiscovery we also differentiate between loose files and emails/attachments. In emails we try to find participants and the threads, so we do not read these two file types the same way. In LFI we request that the customer identify a particular line item as an email, contact or note by specifying a column and if it is populated then we can tell eDiscovery to see it as a particular type of file.
Load File Troubleshooting
In some instances the same load file might have documents with slightly different prefixes and this can cause problems in the LFI tool. There is no easy way to find these apart from looking at the logs. You can then add an extra prefix line to use in the LFI tool depending on what the error says. So essentially this is a process of trial and error. First you save the load file, let the discovery complete (or fail) and then check the log files to see which.
In a nutshell the method should be:
1. Set up load file in test case (or backup existing case before adding the new load file source)
2. Configure the relevant options and map the fields
3. Save the load file and check for errors
a. Note that the activity log does not provide enough information, neither does the server log. The log file to check here is the remotejob log for that particular case. You can find it in the UI by going to Case Home and then clicking on logs.
4. Spot for any errors and make a note of what they relate to:
a. The most frequent errors you will find are that Bates numbers are invalid and this can be for any number of reasons. For example with dat files from Concordance, a hidden character might appear besides the DOCID that does not match the prefix you specified. If that is the case the customer needs to go back and use a hex editor or regenerate the DAT file without those characters.
5. Once you have worked your way through the different errors, save the load file source again and then repeat steps 3 and 4 respectively until the load file discovers successfully.
6. Once that is done you will be ready to process the source.
Please note:
This document assumes the reader has minimal experience working with load files but is familiar with the eDiscovery product and the technical documentation related to ingesting load files into the product. It does not offer a definitive account of the import process, but it will provide the reader with a better understanding that can help put troubleshooting load file issues within a framework.
This document is not a substitute for the eDiscovery load file import documentation, but rather a complements it as a commentary of the process with some useful insights. For specific information about aspects of the LFI tool please refer to the eDiscovery documentation.