Lucene corruption causing 'Generating Term Vectors' exception during Search Analytics phase of Post-processing

book

Article ID: 100013349

calendar_today

Updated On:

Description

Error Message

The following exception will be seen in the Search Analytics log located in 'D:\CW\v##\data\esadb\case-logs\:

2014-05-14 00:23:24,635 INFO  [additions.index.TermVectorGenerator] (Thread-37:) Generating Term Vectors...
2014-05-14 00:23:24,635 INFO  [additions.index.TermVectorGenerator] (Thread-37:) Attempting to fetch the docVS from the local ProcessingDS [D:\CW\V714\data\esadb\dataStore_index_a6ymtt4nch_43173867\consolidation\ObjectStore\docvectors.bin]
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:) java.io.IOException: Invalid vInt detected (too many bits)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:215)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at org.apache.lucene.store.DataInput.readString(DataInput.java:184)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.upitt.IndexedVectorStoreReader$IndexedVectorEnumeration.next(IndexedVectorStoreReader.java:390)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.upitt.IndexedVectorStoreReader$IndexedVectorEnumeration.next(IndexedVectorStoreReader.java:343)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.additions.searcher.DefaultVectorStoreReader.getInMemoryVectorStore(DefaultVectorStoreReader.java:47)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.additions.searcher.DiskDocVSReader.getInMemoryVectorStore(DiskDocVSReader.java:64)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.additions.index.TermVectorGenerator.getExistingDocVS(TermVectorGenerator.java:266)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.additions.index.TermVectorGenerator.generateTermVectors(TermVectorGenerator.java:137)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.additions.index.BatchTermVectorGenerator.generateTVSInBatches(BatchTermVectorGenerator.java:141)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.additions.index.BatchTermVectorGenerator.generateTermVectors(BatchTermVectorGenerator.java:76)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.additions.index.SemanticIndexBuilder.generateIncrementalIndex(SemanticIndexBuilder.java:293)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.additions.index.SemanticIndexBuilder.generateSemanticIndex(SemanticIndexBuilder.java:121)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.additions.index.SemanticIndexBuilder.generateSemanticIndex(SemanticIndexBuilder.java:91)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.SemanticVectorAnalyticsImpl.process(SemanticVectorAnalyticsImpl.java:76)
2014-05-14 00:23:46,772 WARN  [STDERR] (Thread-37:)   at com.teneo.esa.analytics.semanticvector.SemanticVectorAnalyticsService$1.run(SemanticVectorAnalyticsService.java:99)
2014-05-14 00:23:46,788 ERROR [analytics.semanticvector.SemanticVectorAnalyticsService] (Thread-37:) [#80007] Unexpected exception encountered: null
java.lang.NullPointerException at com.teneo.esa.analytics.semanticvector.additions.searcher.DefaultVectorStoreReader.getInMemoryVectorStore(DefaultVectorStoreReader.java:48)

Cause

The root cause of this exception is corruption while reading Lucene index – the exception roots from Lucene.

The precise reason behind the corrupted index cannot be determined by the error message. Some times it is reported due to a conflict in Lucene/Java jars, other times due to corrupt disk segment/RAM etc.

Note: This corruption is in Concept search specific Lucene Index only, and should not affect other areas of the product.

Resolution

Since the cause of the exception seen in Search Analytics log is not determined it is recommended to turn off Concept Search.

Please follow the steps below:

  1. Select the affected case from 'All Cases', and click on Settings.
  2. Under 'Configure processing parameters and features', uncheck 'Enable concept search' and click on Save button at the bottom
  3. Please see the screen shot:

 

Issue/Introduction

Post-processing fails at Search Analytics phase due to Lucene level corruption.