Changeset 34506


Ignore:
Timestamp:
2020-10-22T01:35:22+13:00 (3 years ago)
Author:
ak19
Message:

Redoing work of commit revision 34394: Redoing Bugfix 1 for GLI doc.xml metadata slowdown resulting from earlier bugfix to help GLI cope with filenames and assigned meta that have non-ASCII chars in them. The slowdown happened when gathered files got selected in GLI and was fixed in commit 34394, but the fix was not ideal for 2 reasons. 1. A new form of filename encoding (hexed unicode) going into doc.xml, instead of existing encodings like URL and base64, though those existing encodings weren't the right ones for my first solution. 2. The solution was specific to Windows to cope with special chars in filenames and relied on a new meta field gsdlfullsourcepath being written out to doc.xml by doc.pm. So a built collection moved from Linux to Windows won't show up doc.xml meta in GLI, as it won't have the new doc.xml meta field that Windows is expecting. Have a better solution for 1 that doesn't require the new field. But still can't fix all of point 2, as the existing gsdlsourcefilename meta field in doc.xml can contain Windows Short filenames when the coll is built on Windows and this won't be backwards compatible on Linux anyway. This problem existed before too, except I didn't realise it until now. But the new solution fixes more issues. First step: undoing doc.pm adding new metadata field gsdlfullsourcepath.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/doc.pm

    r34394 r34506  
    125125        # For Unix-based systems, there is no difference between the two
    126126        $self->{'source_path'} = $source_filename;
    127         }       
    128 
     127        }
     128
     129        ## Unused.
    129130        # On Windows, the code above has ensured source_path is the Win long (full) path name.
    130         # To help GLI associate metadata with an easily calculated and accurate representation of
    131         # filenames, we now store the Win long path name, hex encoded.
     131        # To help GLI associate DocXML metadata with an easily calculated and accurate representation
     132        # of filenames, we now store the Win long path name, hex encoded.
    132133        # We're not using this field on Linux, as I can't get the hex encodings generated to match
    133134        # what GLI Java code generates. But for symmetry we store this field on Unix too, but we need
    134135        # to hex-encode source_path on Unix too, or it may not be UTF-8 and doc.xml will be invalid
    135         my $hexencodedlongsourcepath = &unicode::debug_unicode_string($self->{'source_path'});     
    136         $self->set_utf8_metadata_element ($self->get_top_section(), "gsdlfullsourcepath", $hexencodedlongsourcepath);
     136        ##my $hexencodedlongsourcepath = &unicode::debug_unicode_string($self->{'source_path'});       
     137        ##$self->set_utf8_metadata_element ($self->get_top_section(), "gsdlfullsourcepath", $hexencodedlongsourcepath);
    137138       
    138139    }
Note: See TracChangeset for help on using the changeset viewer.