Changeset 27321

Show
Ignore:
Timestamp:
09.05.2013 16:33:05 (6 years ago)
Author:
ak19
Message:

Two bugfixes: 1. Handling of quotes not just the CSV fields containing commas, but around all CSV fields, can happen when a CSV file is exported from OpenOffice?'s Calc spreadsheet program. 2. The second bug was when 2 pdfs, called one.pdf and two.pdf have metadata assigned in meta.csv. Then the metadata gets duplicated for two.pdf (2 dc.Title, 2 dc.Author). If the 2 pdfs were called 1.pdf and 2.pdf, the meta was duplicated for both files. Thanks to Kathy who found that this had something to do with the order of the documents and meta.csv getting processed when the EmbeddedMetadataPlugin? was also in the list. She also found a different bug: that while EmbeddedMetaPlug? merged its own extrameta with existing extrameta, MetaCSVPlug did not merge but overwrote all meta with its own. After adding in merging of extrameta into MetaCSVPlug, the initial bug of duplicate assignment of the meta in the CSV file was resolved too.

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/MetadataCSVPlugin.pm

    r24971 r27321  
    145145    my $found_filename_field = 0; 
    146146    for (my $i = 0; $i < scalar(@csv_file_fields); $i++) { 
    147     # Remove any spaces from the field names 
     147    # Remove any spaces from the field names, and surrounding quotes too 
    148148    $csv_file_fields[$i] =~ s/ //g; 
     149    $csv_file_fields[$i] =~ s/^"//; 
     150    $csv_file_fields[$i] =~ s/"$//; 
     151 
    149152    if ($csv_file_fields[$i] eq "Filename") { 
    150153        $found_filename_field = 1; 
     
    183186            $csv_line_metadata{$csv_file_fields[$i]} = []; 
    184187            } 
    185             push (@{$csv_line_metadata{$csv_file_fields[$i]}}, $1); 
     188            # remove any surrounding quotes. (When exporting to CSV, some spreadsheet 
     189            # programs add quotes even around field values that don't contain commas.) 
     190            my $value = $1; 
     191            $value =~ s/^"//; 
     192            $value =~ s/"$//; 
     193            push (@{$csv_line_metadata{$csv_file_fields[$i]}}, $value); 
    186194        } 
    187195        } 
     
    212220    $csv_line_filename = &util::filename_to_regex($csv_line_filename); 
    213221 
    214     &extrametautil::setmetadata($extrametadata, $csv_line_filename, \%csv_line_metadata); 
    215     &extrametautil::addmetakey($extrametakeys, $csv_line_filename); 
     222    if (defined &extrametautil::getmetadata($extrametadata, $csv_line_filename)) { # merge with existing meta     
     223 
     224        my $file_metadata_table = &extrametautil::getmetadata($extrametadata, $csv_line_filename); 
     225         
     226        foreach my $metaname (keys %csv_line_metadata) { 
     227        # will create new entry if one does not already exist 
     228        push(@{$file_metadata_table->{$metaname}}, @{$csv_line_metadata{$metaname}});        
     229        } 
     230         
     231        # no need to push $file on to $extrametakeys as it is already in the list 
     232    } else { # add as new meta 
     233         
     234        &extrametautil::setmetadata($extrametadata, $csv_line_filename, \%csv_line_metadata); 
     235        &extrametautil::addmetakey($extrametakeys, $csv_line_filename); 
     236    } 
    216237    # record which file the metadata came from  
    217238    if (!defined &extrametautil::getmetafile($extrametafile, $csv_line_filename)) {