Ignore:
Timestamp:
2010-12-06T13:15:10+13:00 (13 years ago)
Author:
davidb
Message:

Further changes to deal with documents that use different filename encodings on the file-system. Now sets UTF8URL metadata to perform the cross-document look up. Files stored in doc.pm as associated files are now always raw filenames (rather than potentially UTF8 encoded). Storing of filenames seen by HTMLPlug when scanning for files to block on is now done in Unicode aware strings rather than utf8 but unware strings.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/ReadTextFile.pm

    r23363 r23387  
    307307    $reader->set_encoding($encoding);
    308308    $reader->decode_text($raw_text,$textref);
     309
     310    # At this point $$textref is a binary byte string
     311    # => turn it into a Unicode aware string, so full
     312    # Unicode aware pattern matching can be used.
     313    # For instance: 's/\x{0101}//g' or '[[:upper:]]'   
     314   
     315    $$textref = decode("utf8",$$textref);
    309316    }
    310317}
Note: See TracChangeset for help on using the changeset viewer.