Context Navigation

← Previous Changeset
Next Changeset →

Changeset 1251

Timestamp:

2000-06-29T10:34:25+12:00 (24 years ago)

Author:

sjboddie

Message:

Added some stat reporting and a warning message to the build code.
Now warns when very little or no text is to be processed for a given
index (as mg craps out in these situations). Will hopefully be useful
in realizing when an attempt is made to create an index of metadata that
is never set etc.

Location:

trunk/gsdl/perllib

Files:

: 2 edited

mgbuilder.pm (modified) (5 diffs)
mgbuildproc.pm (modified) (7 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/gsdl/perllib/mgbuilder.pm

-              r1246
+              r1251
     close ($handle) unless $self->{'debug'};
+    $self->print_stats();
     # create the compression dictionary
     # the compression dictionary is built by assuming the stats are from a seed
 …
            "", {}, $self->{'buildproc'}, $self->{'maxdocs'});
     close ($handle) unless $self->{'debug'};
+    $self->print_stats();
+}
 …
     close ($handle) unless $self->{'debug'};
+    $self->print_stats();
     if (!$self->{'debug'}) {
     # create the perfect hash function
 …
            "", {}, $self->{'buildproc'}, $self->{'maxdocs'});
+    $self->print_stats ();
     if (!$self->{'debug'}) {
 …
+}
+sub print_stats {
+    my $self = shift (@_);
+    my $indexing_text = $self->{'buildproc'}->get_indexing_text();
+    my $index = $self->{'buildproc'}->get_index();
+    my $num_bytes = $self->{'buildproc'}->get_num_bytes();
+    my $num_processed_bytes = $self->{'buildproc'}->get_num_processed_bytes();
+    if ($indexing_text) {
+    print STDERR "Stats (Creating index $index)\n";
+    } else {
+    print STDERR "Stats (Compressing text from $index)\n";
+    }
+    print STDERR "Total bytes in collection: $num_bytes\n";
+    print STDERR "Total bytes in $index: $num_processed_bytes\n";
+    if ($num_processed_bytes < 50) {
+    print STDERR "***************\n";
+    print STDERR "WARNING: There is very little or no text to process for $index\n";
+    if ($indexing_text) {
+        print STDERR "This may cause an error while attempting to build the index\n";
+    } else {
+        print STDERR "This may cause an error while attempting to compress the text\n";
+    }
+    print STDERR "***************\n";
+    }
+}
 ;

trunk/gsdl/perllib/mgbuildproc.pm

-              r1072
+              r1251
     $self->{'num_sections'} = 0;
     $self->{'num_bytes'} = 0;
+    $self->{'num_processed_bytes'} = 0;
     $self->{'indexing_text'} = 0;
 …
     $self->{'num_docs'} = 0;
     $self->{'num_sections'} = 0;
+    $self->{'num_processed_bytes'} = 0;
     $self->{'num_bytes'} = 0;
+}
 …
+}
+# num_bytes is the actual number of bytes in the collection
+# this is normally the same as what's processed during text compression
 sub get_num_bytes {
     my $self = shift (@_);
     return $self->{'num_bytes'};
+}
+# num_processed_bytes is the number of bytes actually passed
+# to mg for the current index
+sub get_num_processed_bytes {
+    my $self = shift (@_);
+    return $self->{'num_processed_bytes'};
+}
 …
+}
+sub get_index {
+    my $self = shift (@_);
+    return $self->{'index'};
+}
 sub set_classifiers {
     my $self = shift (@_);
 …
     $self->{'indexing_text'} = $indexing_text;
+}
+sub get_indexing_text {
+    my $self = shift (@_);
+    return $self->{'indexing_text'};
+}
 …
             if ($real_field eq "text") {
             $new_text = $doc_obj->get_text ($section);
+            $self->{'num_processed_bytes'} += length ($new_text);
             $new_text =~ s/[\cB\cC]//g;
             $self->find_paragraphs($new_text);
 …
             foreach $meta (@{$doc_obj->get_metadata ($section, $real_field)}) {
                 $meta =~ s/[\cB\cC]//g;
+                $self->{'num_processed_bytes'} += length ($meta);
                 $new_text .= "\cC" unless $first;
                 $new_text .= $meta;

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 1251

Legend:

trunk/gsdl/perllib/mgbuilder.pm

trunk/gsdl/perllib/mgbuildproc.pm

Download in other formats: