Opened 16 years ago

Closed 16 years ago

Last modified 4 years ago

#241 closed defect (fixed)

too many open files

Reported by: dmn Owned by: kjdon
Priority: very high Milestone: 3.04 Release
Component: Greenstone3 Runtime Severity: major
Keywords: Cc:

Description

Tomcat falls over with too many open files

these are typically open .ldb files, in normal operation GS3 has several open ldb files per collection (3,4,5,6...)

Linux systems usually have a 1024 limit, over time the no. of open files grows and if it peaks over the limit Tomcat and GS3 crash

Here is sample output from lsof showing several (6) ldb files open at once for the collection ikrgrxv:

java 23255 daven 191rR REG 0,18 143581 19743405 /home/daven/research/greenstone3/web/sites/localsite/collect/ikrgrxv/index/text/ikrgrxv.ldb (goblin:/export/home/staff/daven) java 23255 daven 192rR REG 0,18 143581 19743405 /home/daven/research/greenstone3/web/sites/localsite/collect/ikrgrxv/index/text/ikrgrxv.ldb (goblin:/export/home/staff/daven) java 23255 daven 193rR REG 0,18 143581 19743405 /home/daven/research/greenstone3/web/sites/localsite/collect/ikrgrxv/index/text/ikrgrxv.ldb (goblin:/export/home/staff/daven) java 23255 daven 443rR REG 0,18 143581 19743405 /home/daven/research/greenstone3/web/sites/localsite/collect/ikrgrxv/index/text/ikrgrxv.ldb (goblin:/export/home/staff/daven) java 23255 daven 444rR REG 0,18 143581 19743405 /home/daven/research/greenstone3/web/sites/localsite/collect/ikrgrxv/index/text/ikrgrxv.ldb (goblin:/export/home/staff/daven) java 23255 daven 445rR REG 0,18 143581 19743405 /home/daven/research/greenstone3/web/sites/localsite/collect/ikrgrxv/index/text/ikrgrxv.ldb (goblin:/export/home/staff/daven)

This is mainly derived from just re-configuring the collection repeatedly: e.g. with a URL such as

http://kiwi.cs.waikato.ac.nz:8090/greenstone3/classic?a=s&sa=c

Guess: somewhere in the collection reconfigure (or caching code?) we are holding onto references that keep file handles open and if you have enough collections this will eventually kill Tomcat. Because most of the open file handles (approx 75%) are ldb files then the problem is with the GDBM code, either the Java wrapper or the native code.

You will see errors in the Tomcat logs like this:

SEVERE: Error reading tld listeners java.io.FileNotFoundException: /home/daven/research/greenstone3/packages/tomcat/work/Catalina/localhost/greenstone3/tldCache.ser (Too many open files) java.io.FileNotFoundException: /home/daven/research/greenstone3/packages/tomcat/work/Catalina/localhost/greenstone3/tldCache.ser (Too many open files) java.io.FileNotFoundException: /home/daven/research/greenstone3/packages/tomcat/conf/web.xml (Too many open files)

in

greenstone3/packages/tomcat/logs/catalina<date>.log

The actual files involved vary as not finding files produces unpredictable errors.

this Bash script gives you an idea of how to check this:

TOMCAT_ID=ps ux | grep tomcat | grep java | grep -v grep | awk -F" " '{ print $2 }' echo "Tomcat_ID: $TOMCAT_ID"

NUM_OPEN_FILES=/usr/sbin/lsof -p $TOMCAT_ID | wc -l echo "open files: $NUM_OPEN_FILES"

Change History (6)

comment:1 by dmn, 16 years ago

It seems that the open ldb files are connected to the number of open services, so in here:

http://trac.greenstone.org/browser/greenstone3/trunk/src/java/org/greenstone/gsdl3/service/GS2Browse.java?rev=14185

this code:

String gdbm_db_file = GSFile.GDBMDatabaseFile(this.site_home, this.cluster_name, index_stem);

if (!this.gdbm_src.openDatabase(gdbm_db_file, GDBMWrapper.READER)) {

logger.error("Could not open GDBM database!");

return false;

}

keeps an open file handle via

protected GDBMWrapper gdbm_src

If you force a cleanUp() at the end of the configure() method then the open files are reduced but the web UI (and maybe the SOAP UI?) falls over.

If the architecture we have is an open file for each classifier (or 1 per browse and 1 per search or similar) then I think it will not scale and needs to be fundamentally rewritten.

comment:2 by dmn, 16 years ago

Owner: changed from nobody to kjdon

comment:3 by kjdon, 16 years ago

Status: newassigned

comment:4 by kjdon, 16 years ago

Resolution: fixed
Status: assignedclosed

I have fixed the ever increasing number of file handles. On a reconfigure, the old collection object wasn't being cleaned up properly. Now it is (hopefully). Multiple reconfigures shouldn't increase the number of file handles now.

However, there is still a problem that if the number of collections is too large, then there couldbe too many file handles. See new ticket #250.

Note: See TracTickets for help on using tickets.