Opened 10 years ago
Closed 9 years ago
#885 closed defect (fixed)
Run Solr 4.7.2 as a servlet inside tomcat, instead of building solr col launching jetty server
Reported by: | ak19 | Owned by: | ak19 |
---|---|---|---|
Priority: | moderate | Milestone: | 3.07 Release |
Component: | Greenstone3 Runtime | Severity: | major |
Keywords: | Cc: |
Description
Running solr with GS3 server removes cores at ant stop preventing sucessful Jetty server start
Running the Jetty server runs the local solr web interface. We use this to test new things we can try to do with solr. I think we can run the jetty server only when the GS3 server is stopped.
The buildcol process also runs the jetty server to ingest documents.
When we run the GS3 server and do a search in a solr collection, it adds cores to the file web/ext/solr/solr.xml. If we then stop the GS3 server, the cores are removed again from solr.xml. We can't successfully run the jetty server at this point, since it can't find the cores in solr.xml and we have to re-run the buildcol process on the collection to get them back.
However, if we run the GS3 server and stop it again immediately, the cores remain in the solr.xml file. At this point, we can run the jetty server.
What seems to be happening is a symmetric loading of solr cores on startup, and unloading of cores on stopping the GS3 server. See SolrSearch.java, configure() and cleanup() methods. The symmetry makes sense, but we need to investigate whether it is really necessary to remove the solr cores on shutdown, since this is what is preventing us from running the jetty server whenever we want. Does any part of the solr-related code, including activate.pl, actually require that the solr cores be removed? Start by looking at activate.pl, ext/solr/perllib/solrserver.pm, GS3 src code's gsdl3/service/SolrSearch.java
Change History (7)
comment:1 by , 10 years ago
Summary: | Running solr with GS3 server removes cores at ant stop preventing sucessful Jetty server start → Run Solr 4.7.2 as a servlet inside tomcat, instead of building solr col launching jetty server |
---|
comment:2 by , 10 years ago
For the upcoming GS3 workshop, some fixes have been made to GLI to deal with solr collections. Among them is a temporary fix whereby GLI now stops the GS3 server before building a SOLR collection and then restarts the GS3 server after the solr collection has been built. The other fix to GLI is permanent, and concerns the preservation of solr-specific elements in the collectionConfig.xml. In the past, if you had opened a solr collection in GLI, GLI would have clobbered all the solr-specific elements in the collectionConfig.xml, such as <facet>, <solr> and the newly-introduced <option> subelement of <index>.
Added code to allow GLI to preserve any solr-specific <sort> and <facet> subelements of <search> if these were manually-added to a GS3 collectionConfig.xml
Also see http://trac.greenstone.org/changeset/29177
GLI now preserves the newly added (optional) option subelements of collectionConfig.xml's index element. This is only used for solr collections at present when the user hand-edits collectionConfig.xml and specifies the solr field type (option-name solrfieldtype) for an index other than the default text_en_splitting. E.g. type text_es for index allfields.
(Default is text_en_splitting)
- TEMPORARY: http://trac.greenstone.org/changeset/29222
SOLR related. TEMPORARY changes for the GS3 workshop. Owing to the change to Solr 4.7.2, solr collections can't re-build despite activate if the GS3 server is running because there is a conflict with the jetty server launched by buildcol and jetty finds a lock on the index. The result is that one can't search the solr index after such a rebuild. Dr Bainbridge suggested a temporary measure: instead of commandline building solr collections, we will now build them in GLI. GLI will build solr collections with activate on but, for solr collections alone, it will stop the GS3 server before a build and start it again upon completion. In future, we will get rid of the solr jetty server and just have solr running over HTTP from tomcat. The Java GS3 runtime code will have to access Solr as a HTTPSolrServer rather than as an EmbeddedSolrServer? at that point.
comment:3 by , 10 years ago
Lucene/Solr upgrade from version 3.3 to 4.7.2 involved commit revisions between 29133 of 16.07.2014 and 29228 of 21.08.2014
comment:4 by , 9 years ago
And a further commit (important fix) at http://trac.greenstone.org/changeset/29355
Commits are for tickets http://trac.greenstone.org/ticket/872 and http://trac.greenstone.org/ticket/666
comment:5 by , 9 years ago
Making Solr run off the tomcat server, rather than using the jetty server included with solr:
(http://trac.greenstone.org/changeset/29708, http://trac.greenstone.org/changeset/29709, http://trac.greenstone.org/changeset/29710)
MAJOR CHANGES:
http://trac.greenstone.org/changeset/29711
A bug still remains and is visible after rebuilding a solr or even lucene collection, where the 2nd page of search results is empty unless the server is restarted.
http://trac.greenstone.org/changeset/29714
http://trac.greenstone.org/changeset/29722
http://trac.greenstone.org/changeset/29749
http://trac.greenstone.org/changeset/29751
Limiting access to the /solr servlet, changes for Linux then Windows:
http://trac.greenstone.org/changeset/29723
http://trac.greenstone.org/changeset/29754
To work with commit 29687 where web.xml was split into web.xml and servlets.xml and the latter's contents were being included into web.xml as an entity, which broke xml parsing in gs3-server when viewing File > Settings and on GLI startup:
http://trac.greenstone.org/changeset/29722
http://trac.greenstone.org/changeset/29728
http://trac.greenstone.org/changeset/29729
http://trac.greenstone.org/changeset/29730
comment:6 by , 9 years ago
Successfully moved the ext/solr SolrQueryWrapper.getTerms() methods to the solr server side.
http://trac.greenstone.org/changeset/29986
The getTerms() functionality previously used by the EmbeddedSolrServer? has now been re-implemented for HttpSolrServer? with the new custom Greenstone Solr RequestHandler? class Greenstone3SearchHandler, which lives on the solr server side, in tomcat's solr webapp. The functionality has been improvemed, such as being able to search for: econom* cat, by recursively calling setRewriteMethods on any PrefixQuery? and WildcardQuery? MultiQueries? within an overall BooleanQuery?, and by handling BooleanQuery?.TooManyClauses? exceptions when the number of expanded terms is too large, such as for a search of a*.
comment:7 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
The original title of this ticket was "Running solr with GS3 server removes cores at ant stop preventing successful Jetty server start"
The problem is actually different. In Solr 4.7.2, the jetty server launched by buildcol to build the solr collection interferes with the GS3 tomcat server IF this already running. An index locked SOLR exception is thrown. This wasn't a problem in Solr for Lucene 3.3 because the jetty server could successfully run independently of a running GS3 server and access the same Solr index.
The problem is described in detail in the email to Dr Bainbridge "Testing solrbuilder URL commands against a running Jetty Solr server" of 19/08/14 21:32
Dr Bainbridge's suggested solution is in the email "Re: Testing solrbuilder URL commands against a running Jetty Solr server" of 19/08/14 23:03, as follows:
"Have only made a very quick scan through the details, but there are some very useful insights in this. My first response is that I think we should think fairly seriously about merging (in some way) what we currently run as the 'jetty' servlet for Solr in with the Tomcat one. That way we would only ever be running one server. I'm optimistic that the merging could be mostly achieved by putting the solr.jar file (or what ever it is that jetty likes) into the tomcat servlet area, so things like 'ant start' and 'ant stop' make both a localhost:8383/greenstone/... URL valid, but also something like localhost:8383/solr/...
Then we could do something like:
Step 2 is less clear in my mind, so we might want to read around the subject a bit in terms of what exactly Solr's Java classes can do, and so what is the simplest way to code up what we need."
For step 1, see https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Tomcat
Step 2 would require changing over from using an EmbeddedSolrServer in the Java code to HttpSolrServer (GS2SolrSearch.java in GS3/src/java and SolrQueryWrapper.java in ext/solr/src).
The solr class EmbeddedSolrServer has a function that returns the CoreContainer, from which the solr cores can be obtained. (The class SolrDispatchFilter also provides access to the CoreContainer of a running solr server). However, the HttpSolrServer java class does not give access to the CoreContainer, so we need to code things differently. Our own SolrQueryWrapper.getTerms() Greenstone function obtains the EmbeddedSolrServer to work out the term frequency of search terms in the index and documents returned. So we can't go this route with the HttpSolrServer.
Dr Bainbridge suggests that on the solr side, the java code (in one of the jars) must be using an EmbeddedSolrServer. And if on the solr server side we were to modify the code to work out the term frequencies, then on our SolrQueryWrapper side, we can send off a request over http to the running solr servlet and request the term frequencies.
The sequence is:
Test that the solr collections rebuild with -activate, with both the GS3 server already running and not running. Test that after rebuild searching still works. In particular, when the GS3 server is already running, test that after rebuilding with the word "mouse" in a document that never contained that word, the modified document now shows up in the search results when searching at document (not section) level for "mouse".