Ignore:
Timestamp:
2019-12-02T23:15:50+13:00 (4 years ago)
Author:
ak19
Message:

Bugfix to GLI to being able to parse metadata.xml files containing & chars. Basically, after FilenameEncoding.filtToURLEncoding() calls URI.toASCIIString(), I replace all non-hex-entity ampersands with their hex entity. This preserves it correctly in the metadata.xml files. Some details: Because we need to use URI.toASCIIString(), which only converts char values > 127 to in the URI to its URL encoded hex value (and not hex entity &#x....; as I now find, despite having coded carefully around hex entities with extra unnecessary effort), the code for encoding values less than 127 as their URL hex code has to be manually done. And escaping & to its hex entity becomes complicated by the fact that we don't want to modify any existing hex entities in the ASCIIstring produced by toASCIIString. (Not a complication when we're dealing with strings containing URL/% encoded hex values.) If more chars in metadata.xml pose a problem for GLI parsing them, then a better solution should be invented to replace URI.toASCIIString(). If only stringToHex() would suffice. I think however that toASCIIString() properly URL encodes unicode codepoints, so that one can have 3 or 4 %XX in sequence, like %xx%xx%xx. Work to be done: plus signs in filenames still need to be handled. GLI handles them okay so far as loading the correct file level meta attached to a filename containing + signs, but when built metadata for such a file does not appear in doc.xml. This may be a perl issue if it happens on build?

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/gli/src/org/greenstone/gatherer/metadata/MetadataXMLFile.java

    r33738 r33739  
    668668        }
    669669       
    670         String metadata_xml_file_directory_path = FilenameEncoding.filenameToURLEncoding(".");
    671         metadata_xml_file_directory_path = metadata_xml_file_directory_path.substring(0, metadata_xml_file_directory_path.length()-2); // cut off /. at end
    672         System.err.println("@@@ metadata_xml_file_directory_path: " + metadata_xml_file_directory_path);
     670        String curr_directory_path = FilenameEncoding.filenameToURLEncoding(".");
     671        curr_directory_path = curr_directory_path.substring(0, curr_directory_path.length()-2); // cut off /. at end
     672        //System.err.println("@@@ curr_directory_path: " + curr_directory_path);
    673673       
    674674        //System.err.println("PARSED loaded_file contains:\n" +  XMLTools.elementToString(doc.getDocumentElement(), true));
     
    695695
    696696                    // now lop off the metadataxml dir prefix the FilenameEncoding.filenameToURLEncoding(STRING) variant would have added
    697                     encoded_filename = encoded_filename.substring(metadata_xml_file_directory_path.length());
     697                    encoded_filename = encoded_filename.substring(curr_directory_path.length());
    698698                    if (encoded_filename.startsWith(FilenameEncoding.URL_FILE_SEPARATOR)) {
    699699                        encoded_filename = encoded_filename.substring(FilenameEncoding.URL_FILE_SEPARATOR.length());
Note: See TracChangeset for help on using the changeset viewer.