Context Navigation

← Previous Change
Next Change →

CSVPlugin.pm

Timestamp:

2020-07-09T09:38:42+12:00 (4 years ago)

Author:

ak19

Message:

Dr Bainbridge in his commit 32810 had expressed that he intended to commit his MetadataCSVPlugin related work for dlheritage to the main GS after the then upcoming GS3 release. His plugin changes support multiple values for a metadata field work and these changes for me in the GS3tutorials collection that uses a metadata.csv file. Like dlheritage, I also use the pipe symbol to separate multiple meta values for a meta field/column. Kathy had made a bugfix to MetadataCSVPlugin since Dr Bainbridge's branched the code off for dlheritage. I will incorporate her bugfix into Dr Bainbridge's work and test things still work and will commit that separately next. Committing from uni machine, as something weird about WMTB VM where I tested these plugin changes and additions: svn committing hasn't been working for a few days now but freezes trying to transmit data.

File:

: 1 edited

main/trunk/greenstone2/perllib/plugins/CSVPlugin.pm (modified) (4 diffs)

Legend:

: Unmodified
: Added
: Removed

main/trunk/greenstone2/perllib/plugins/CSVPlugin.pm

-              r33389
+              r34249
 package CSVPlugin;
 use SplitTextFile;
 use MetadataRead;
+use CSVFieldSeparator;
 use strict;
 no strict 'refs'; # allow filehandles to be variables and viceversa
+use Text::CSV;
 # CSVPlugin is a sub-class of SplitTextFile.
 sub BEGIN {
     @CSVPlugin::ISA = ('MetadataRead', 'SplitTextFile');
+    @CSVPlugin::ISA = ('MetadataRead', 'SplitTextFile', 'CSVFieldSeparator');
+}
 my $arguments =
+    [ { 'name' => "process_exp",
+    [
+      { 'name' => "process_exp",
     'desc' => "{BaseImporter.process_exp}",
     'type' => "regexp",
 …
     push(@{$hashArgOptLists->{"OptList"}}, $options);
+    new CSVFieldSeparator($pluginlist, $inputargs, $hashArgOptLists);
     my $self = new SplitTextFile($pluginlist, $inputargs, $hashArgOptLists);
 …
     $$textref =~ s/^(.*?)\r?\n//;
     my @csv_file_fields = ();
+    my $csv_file_field_line = $1 . ",";  # To make the regular expressions simpler
+    while ($csv_file_field_line ne "") {
+    # Handle quoted values
+    if ($csv_file_field_line =~ s/^\"(.*?)\"\,//) {
+        my $csv_file_field = $1;
+        $csv_file_field =~ s/ //g;  # Remove any spaces from the field names
+        push(@csv_file_fields, $csv_file_field);
+    }
+    # Normal comma-separated case
+    elsif ($csv_file_field_line =~ s/^(.*?)\,//) {
+        my $csv_file_field = $1;
+        $csv_file_field =~ s/ //g;  # Remove any spaces from the field names
+        push(@csv_file_fields, $csv_file_field);
+    }
+    # The line must be formatted incorrectly
+    else {
+        print STDERR "Error: Badly formatted CSV field line: $csv_file_field_line.\n";
+        last;
+    }
+    my $csv_file_field_line = $1;
+    my $separate_char = $self->{'csv_field_separator'};
+    if ($separate_char =~ m/^auto$/i) {
+    $separate_char = $self->resolve_auto($csv_file_field_line,$self->{'plugin_type'});
+    # Replace the 'auto' setting the resolved value (for use later on)
+    $self->{'separate_char'} = $separate_char;
+    }
+    $self->{'csv_file_fields'}->{$filename} = \@csv_file_fields;
+    ###print STDERR "**** CSV file fields joined ($filename) = ", join(" ||| ", @{$self->{'csv_file_fields'}->{$filename}}), "\n";
+    my $csv = Text::CSV->new();
+    $csv->sep_char($separate_char);
+    if ($csv->parse($csv_file_field_line)) {
+    @csv_file_fields = $csv->fields;
+    }
+    else {
+    print STDERR "Error: Badly formatted CSV field line: $csv_file_field_line.\n";
+    }
+    $self->{'csv_file_fields'} = \@csv_file_fields;
+    # print STDERR "**** CSV file fields joined = ", join(" ||| ", @{$self->{'csv_file_fields'}}), "\n";
+}
 …
     my $section = $doc_obj->get_top_section();
     my $csv_line = $$textref;
+    my $filename_full_path = &FileUtils::filenameConcatenate($base_dir,$file);
+    my @csv_file_fields = @{$self->{'csv_file_fields'}->{$filename_full_path}};
+    ###print STDERR "**** CSV file fields joined = ", join(" ||| ", @csv_file_fields), "\n";
+     # Add the raw line as the document text
+    my @csv_file_fields = @{$self->{'csv_file_fields'}};
+    # Add the raw line as the document text
     $doc_obj->add_utf8_text($section, $csv_line);
+    my $separate_char = $self->{'separate_char'};
+    my $md_val_sep = $self->{'metadata_value_separator'};
+    undef $md_val_sep if ($md_val_sep eq "");
+    my $csv = Text::CSV->new();
+    $csv->sep_char($separate_char);
     # Build a hash of metadata name to metadata value for this line
+    my $i = 0;
+    $csv_line .= ",";  # To make the regular expressions simpler
+    while ($csv_line ne "") {
+    # Metadata values containing commas are quoted
+    if ($csv_line =~ s/^\"(.*?)\"\,//) {
+    if ($csv->parse($csv_line)) {
+    my @md_vals = $csv->fields;
+    my $md_vals_len = scalar(@md_vals);
+    for (my $i=0; $i<$md_vals_len; $i++) {
+        my $md_val = $md_vals[$i];
         # Only bother with non-empty values
+        if ($1 ne "" && defined($csv_file_fields[$i])) {
+        $doc_obj->add_utf8_metadata($section, $csv_file_fields[$i], $1);
+        if ($md_val ne "" && defined($csv_file_fields[$i])) {
+        if (defined $md_val_sep) {
+            my $md_name = $csv_file_fields[$i];
+            my @within_md_vals = split(/${md_val_sep}/,$md_val);
+            foreach my $within_md_val (@within_md_vals) {
+            $doc_obj->add_utf8_metadata($section, $md_name, $within_md_val);
+            }
+        }
+        else {
+            $doc_obj->add_utf8_metadata($section, $csv_file_fields[$i], $md_val);
+        }
+        }
+    }
+    # Normal comma-separated case
+    elsif ($csv_line =~ s/^(.*?)\,//) {
+        # Only bother with non-empty values
+        if ($1 ne "" && defined($csv_file_fields[$i])) {
+        $doc_obj->add_utf8_metadata($section, $csv_file_fields[$i], $1);
+        }
+    }
+    # The line must be formatted incorrectly
+    else {
+        print STDERR "Error: Badly formatted CSV line: $csv_line.\n";
+        last;
+    }
+    $i++;
+    }
+    else {
+    print STDERR "Error: Badly formatted CSV line: $csv_line.\n";
+    }

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 34249 for main/trunk/greenstone2/perllib/plugins/CSVPlugin.pm

Legend:

main/trunk/greenstone2/perllib/plugins/CSVPlugin.pm

Download in other formats: