Opened 12 years ago
#821 new defect
Lucene is slow compared to MG
Reported by: | ak19 | Owned by: | nobody |
---|---|---|---|
Priority: | moderate | Milestone: | Possible 2.88 Release |
Component: | Collection Building | Severity: | major |
Keywords: | Cc: |
Description
Diego wrote:
Clacso is a leading case here in Argentina. They have more than 12.000 documents in 150 collections. They have GS 2.52 with mg indexes and when they run a query in the "general" collection (one supercollection that queries in all the collections) the performance is ver very good.
If you look here: http://sala.clacso.org.ar/gsdl/cgi-bin/library
and write something in the query box, like "pobreza" you will have the following results in a few seconds:
Word counts: pobreza: 23984 4790 documents...
Now I´m migrating it to 2.85 using Lucene as indexer. I also have a supercollection but when I run a query cpu goes to 100% and it takes minutes to get the results!.
The link is
http://sala.clacso.org.ar/gsdl285/cgi-bin/library.cgi?a=p&p=about&c=general&l=es&w=utf-8
Try "pobreza" again. You will have to wait a lot!!!
I tried many options for Java. I edited lucene_query.pl to change java parameters, i.e:
my $java_lucene = "\"$java\" -Xms1024m -Xmx1024m -XX:+AggressiveOpts -XX:+UseG1GC -classpath \"$classpath\" org.greenstone.LuceneWrapper.GS2LuceneQuery";
but nothing change.
Where is the problem?. The server resources?. The way Lucene do the queries?. Some specific configuration for Apache?