Ignore:
Timestamp:
2019-11-28T22:17:15+13:00 (4 years ago)
Author:
ak19
Message:

Experimental encoding related bugfix to GLI. In GLI, meta assigned at file level to filenames with non-ascii chars were not sticking to the file, because repeated entries were written out to metadata.xml under 2 variants of the filename but never loaded back into GLI again. This problem was not apparent with the old FilenameEncodings test set of docs or Kathy's complex test case of Russian filenames gathered into a folder structure. In the latter case, meta was assigned at folder level and so the regex to match was .* which is just ASCII. Neither test document sets were tested with meta assigned at file level. I can't now remember whether we tested today whether assigning file level meta to docs in the FilenameEncodings test set worked or not, but if it did, maybe that was because the special characters were not too complex and just Latin-1 or Win codepage 850 (like 1252) for the docs where meta was assigned. In any case, with test docs where filenames had A-macrons in them, the problem showed up and also in the Russian test set if meta got assigned at doc level. GLI was correctly saving filenames that had meta into metadata.xml as hex-encoded filenames the first time around. It just wasn't comparing them to hex values on subsequent times, and thus not finding a match. Method FilenameEncoding.fileNameToHex() introduced to fix this (experimental, need to run some questions by Dr Bainbridge). For all current tests, this appears to have fixed it. However, there must be somewhere else that ex.meta is being loaded in, as that is still not appearing for specially named files.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/gli/src/org/greenstone/gatherer/metadata/MetadataXMLFileManager.java

    r23433 r33727  
    3737import org.greenstone.gatherer.util.XMLTools;
    3838
     39import org.greenstone.gatherer.util.Utility;
     40
    3941
    4042/** This class is a static class that manages the metadata.xml files */
     
    7880    for (int i = 0; i < file_nodes.length; i++) {
    7981        File current_file = file_nodes[i].getFile();
    80         DebugStream.println("Adding metadata to " + current_file.getAbsolutePath());
     82        DebugStream.println("Adding metadata to " + current_file.getAbsolutePath() + " - hex: " + Utility.debugUnicodeString(current_file.getAbsolutePath()));     
    8183
    8284        // Find which metadata.xml file needs editing
Note: See TracChangeset for help on using the changeset viewer.