Ticket #947 (closed defect: fixed)

Opened 2 weeks ago

Last modified 2 weeks ago

Lucene index file locking fix

Reported by: ak19 Owned by: ak19
Priority: moderate Milestone: 3.09 Release
Component: Greenstone3 Runtime Severity: major
Keywords: filelocking Cc:

Description

The Lucene index file lock problem would occur because deactivating a collection on Windows didn't close all the file handles to the index dir.

The cause was opening many Index Readers (one at every query) storing them in the same member var, overwriting the reference to the previously stored open index reader. Closing the index reader happened only once: when the collection was deactivated. So lot's of zombie opened Index Readers were around on deactivation.

The initial solution was simple, to close an already instantiated index reader before using it to open any other index:

http://trac.greenstone.org/changeset/32609

http://trac.greenstone.org/changeset/32610

However, investigating the file locking problem exposed a greater one in Lucene: that Lucene code was allowing multiple queries but they used a single GS2LuceneQuery object per collection. So configuration for one query that's run simultaneously with others could easily use another simultaneous Query's configuration.

This was fixed here:

http://trac.greenstone.org/changeset/32619

http://trac.greenstone.org/changeset/32620

http://trac.greenstone.org/changeset/32636

The solution is explained in the commit message:

"3 significant changes in 1 commit particularly impacting Lucene queries:

1. Instead of GS2LuceneSearch having a GS2LuceneQuery object member variable for doing each and every search, each query now instantiates its own local GS2LuceneQuery object, configures it for that specific search, runs the search and then the GS2LuceneQuery object expires. This fixes a bug by preventing multiple concurrent searches getting the search configurations of other searches run at the same time.

2. Though GS2LuceneQuery objects need to be instantiated 1 per query over a collection, we don't want to keep reopening a collection's sidx and didx index folders with IndexReader? objects for every query. Since IndexReaders? support concurrent access, we'd like to use one IndexReader? per collection index (one for didx, one for sidx) with the IndexReaders? existing for the life of a collection. This meant moving the maintaining of IndexReader? objects from GS2LuceneQuery into the GS2LuceneSearch service and turning them into singletons by using a HashMap? to maintain index-dir, reader pairs. GS3 Services, e.g. GS2LuceneSearch, are loaded and unloaded on collection activate and deactivate respectively. On deactivate, cleanUp() is called on services and other GS3 modules. When GS2LuceneSearch.cleanUp() is called, we now finally close the singleton IndexReader? objects/resources that a collection's GS2LuceneSearch object maintains.

3. Redid previous bugfix (then committed to GS2LuceneQuery): Point 2 again solves the filelocking problem of multiple handles to the index being opened and not all being closed on deactivate, but it's solved in a different and better/more optimal way than in the previous commit."

NOTES: - MGPP only needed the updated method signatures, and code affected code by changes to the method signatures, to be updated. (MG didn't have the same method, so it was not affected.) MG and MGPP use synchronize because they're not "multi-threaded reentrant" the way Lucene's IndexReader? object is.

- For SOLR too, only method signatures and code affected code by that change needed updating. Our Solr ext code does not open an IndexReader? (or even instantiate a Searcher object with it). Our code only obtains an IndexSearcher? and calls getReader on that to get access to the IndexReader?. The IndexReader? (and IndexSearcher?) object is part of the HTTPSolrServer side of the code, the side which takes care of doGet and doPost requests. And our custom Greenstone3SearchHandler class on the HttpSolrServer? side already inherits the opened IndexReader? (and associated instantiated IndexSearcher?). We may assume that the HttpSolrServer? side takes care of optimally opening and maintaining IndexReader? objects (and any way this may affect IndexSearchers?), and that we can forget about this part.

Change History

Changed 2 weeks ago by ak19

  • status changed from new to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.