Ticket #821 (new defect)

Opened 6 years ago

Lucene is slow compared to MG

Reported by: ak19 Owned by: nobody
Priority: moderate Milestone: 2.87 Release
Component: Collection Building Severity: major
Keywords: Cc:

Description

Diego wrote:

Clacso is a leading case here in Argentina. They have more than 12.000 documents in 150 collections. They have GS 2.52 with mg indexes and when they run a query in the "general" collection (one supercollection that queries in all the collections) the performance is ver very good.

If you look here:  http://sala.clacso.org.ar/gsdl/cgi-bin/library

and write something in the query box, like "pobreza" you will have the following results in a few seconds:

Word counts: pobreza: 23984 4790 documents...

Now I´m migrating it to 2.85 using Lucene as indexer. I also have a supercollection but when I run a query cpu goes to 100% and it takes minutes to get the results!.

The link is

 http://sala.clacso.org.ar/gsdl285/cgi-bin/library.cgi?a=p&p=about&c=general&l=es&w=utf-8

Try "pobreza" again. You will have to wait a lot!!!

I tried many options for Java. I edited lucene_query.pl to change java parameters, i.e:

my $java_lucene = "\"$java\" -Xms1024m -Xmx1024m -XX:+AggressiveOpts? -XX:+UseG1GC -classpath \"$classpath\" org.greenstone.LuceneWrapper?.GS2LuceneQuery";

but nothing change.

Where is the problem?. The server resources?. The way Lucene do the queries?. Some specific configuration for Apache?

Note: See TracTickets for help on using tickets.