Context Navigation

← Previous Change
Next Change →

classify.pm

Timestamp:

2009-09-10T10:46:36+12:00 (15 years ago)

Author:

davidb

Message:

Opening of txt2db moved to earlier in the buildcol process. This was done to avoid a huge memory spike that occurred with incremental building. Previously we recoconstructed all the documents from the GDBM database. Then the code added, edited, removed documents as required (i.e. the incremental bit), then it wrote it all out to GDBM. The problem was that the reconstructed phase could grow quite large -- an example PagedImage collection of 100000 documents took 2.4 GB when read in. When it got to the stage of opening a pipe to the datbase with open('|txt2db'), the fork() call that occurs inside this function requires the system to (briefly) have *two* 2.4 GB processes, before quickly replacing the child process with the much smalled 'txt2db' process. It is at the point of the duplication of the two processes that can cause a computer to run out of memory. In the PagedImage example, the machine had 2 GB of main memory and 2 GB of swap. Therefore there was no way it could sustain two 2.4 GB processes.\n Long explanation. The good news is shifting the open() to be before the documents are reconstructed solves the problem.

File:

: 1 edited

gsdl/trunk/perllib/classify.pm (modified) (2 diffs)

Legend:

: Unmodified
: Added
: Removed

gsdl/trunk/perllib/classify.pm

-              r19772
+              r20575
     my $infodb_type = shift(@_);
     my $infodb_file_path = shift(@_);
+    my %database_recs;
+    &dbutil::read_infodb_file($infodb_type, $infodb_file_path, \%database_recs);
+    my $database_recs = shift(@_);
     # dig out top level doc sections
     my %top_sections = ();
     my %top_docnums = ();
     foreach my $key ( keys %database_recs )
+    foreach my $key ( keys %$database_recs )
+    {
     my $md_rec = $database_recs{$key};
+    my $md_rec = $database_recs->{$key};
     my $md_hash = db_rec_to_hash($md_rec);
 …
         add_section_content ($doc_obj, $top, $doc_db_hash);
         my $children = &get_children($doc_db_hash);
         recurse_sections($doc_obj, $children, $oid, $top, \%database_recs);
+        recurse_sections($doc_obj, $children, $oid, $top, $database_recs);
     push(@all_docs,$doc_obj);

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 20575 for gsdl/trunk/perllib/classify.pm

Legend:

gsdl/trunk/perllib/classify.pm

Download in other formats: