Changeset 27321 for main/trunk


Ignore:
Timestamp:
2013-05-09T16:33:05+12:00 (11 years ago)
Author:
ak19
Message:

Two bugfixes: 1. Handling of quotes not just the CSV fields containing commas, but around all CSV fields, can happen when a CSV file is exported from OpenOffice's Calc spreadsheet program. 2. The second bug was when 2 pdfs, called one.pdf and two.pdf have metadata assigned in meta.csv. Then the metadata gets duplicated for two.pdf (2 dc.Title, 2 dc.Author). If the 2 pdfs were called 1.pdf and 2.pdf, the meta was duplicated for both files. Thanks to Kathy who found that this had something to do with the order of the documents and meta.csv getting processed when the EmbeddedMetadataPlugin was also in the list. She also found a different bug: that while EmbeddedMetaPlug merged its own extrameta with existing extrameta, MetaCSVPlug did not merge but overwrote all meta with its own. After adding in merging of extrameta into MetaCSVPlug, the initial bug of duplicate assignment of the meta in the CSV file was resolved too.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/MetadataCSVPlugin.pm

    r24971 r27321  
    145145    my $found_filename_field = 0;
    146146    for (my $i = 0; $i < scalar(@csv_file_fields); $i++) {
    147     # Remove any spaces from the field names
     147    # Remove any spaces from the field names, and surrounding quotes too
    148148    $csv_file_fields[$i] =~ s/ //g;
     149    $csv_file_fields[$i] =~ s/^"//;
     150    $csv_file_fields[$i] =~ s/"$//;
     151
    149152    if ($csv_file_fields[$i] eq "Filename") {
    150153        $found_filename_field = 1;
     
    183186            $csv_line_metadata{$csv_file_fields[$i]} = [];
    184187            }
    185             push (@{$csv_line_metadata{$csv_file_fields[$i]}}, $1);
     188            # remove any surrounding quotes. (When exporting to CSV, some spreadsheet
     189            # programs add quotes even around field values that don't contain commas.)
     190            my $value = $1;
     191            $value =~ s/^"//;
     192            $value =~ s/"$//;
     193            push (@{$csv_line_metadata{$csv_file_fields[$i]}}, $value);
    186194        }
    187195        }
     
    212220    $csv_line_filename = &util::filename_to_regex($csv_line_filename);
    213221
    214     &extrametautil::setmetadata($extrametadata, $csv_line_filename, \%csv_line_metadata);
    215     &extrametautil::addmetakey($extrametakeys, $csv_line_filename);
     222    if (defined &extrametautil::getmetadata($extrametadata, $csv_line_filename)) { # merge with existing meta   
     223
     224        my $file_metadata_table = &extrametautil::getmetadata($extrametadata, $csv_line_filename);
     225       
     226        foreach my $metaname (keys %csv_line_metadata) {
     227        # will create new entry if one does not already exist
     228        push(@{$file_metadata_table->{$metaname}}, @{$csv_line_metadata{$metaname}});       
     229        }
     230       
     231        # no need to push $file on to $extrametakeys as it is already in the list
     232    } else { # add as new meta
     233       
     234        &extrametautil::setmetadata($extrametadata, $csv_line_filename, \%csv_line_metadata);
     235        &extrametautil::addmetakey($extrametakeys, $csv_line_filename);
     236    }
    216237    # record which file the metadata came from
    217238    if (!defined &extrametautil::getmetafile($extrametafile, $csv_line_filename)) {
Note: See TracChangeset for help on using the changeset viewer.