Ignore:
Timestamp:
2019-05-03T21:46:42+12:00 (5 years ago)
Author:
ak19
Message:
  1. Dr Bainbridge's fix for the Windows encoding issue when online doc editing Title meta containing non-ASCII/non-basic unicode chars, was to URL encode the QUERY_STRING so that CGI.pm would then do the right thing. This was because the problem had turned out not to be env vars on Windows, which could set and recall unicode and win-1252 chars just fine and therefore retained what was entered. The problem was that on Windows, the perl did not get the actual chars transmitted in the case of UTF-8 whereas win-1252 was received looking apparently like a unicode codepoint, but then in the latter case the utf82unicode call in metadataaction would then clobber the codepoint in attempting to do utf82unicode on it. On linux, perl happened to receive the chars as utf8-encoded bytes and so utf82unicode worked (to make them unicode aware strings?). The real problem was that it could go wrong in different ways on windows, since utf8 chars weren't even received properly by perl/CGI, so we didn't even need to start worrying about them getting sometimes clobbered in metadataaction. URL encoding the QUERY_STRING was meant to solve this. Except that URL encoding the whole QUERY_STRING made CGI.pm choke on the equals signs between param name and param value and possibly other chars. I don't know why. I found that URL encoding just the param values in the QUERY_STRING works, so I am committing that. 2. Renaming the recently introduced string2hex() in JavaScript to debug_unicode_string and stringToHex() in Java to debugUnicodeString() to be more consistent with the perl variant, debug_unicode_string. Also like in the perl, the JavaScript and Java now print the unicode value inside curly braces for better legibility. 3. Leaving in some commented out encoding debugging statements in the Java and JavaScript code, but not committing the debugging on the perl side. 4. Some further improvements to overloaded functions in GSXML using debug_unicode_string for converting XML elements or printing them to logs.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone3/src/java/org/greenstone/gsdl3/util/GSXML.java

    r33043 r33045  
    16241624    }
    16251625
    1626     public static void elementToLogAsString(Element e, boolean indent)
    1627     {
    1628         String str = elementToString(e, indent);
     1626    private static void elementToLogAsString(String prefix, Element e, boolean indent, boolean debugEncoding)
     1627    {
     1628        String str = prefix + "\n" + elementToString(e, indent, debugEncoding);
    16291629        System.err.println(str);
    1630         logger.error(str);
     1630        logger.info(str);
     1631    }
     1632   
     1633    // hex/unicode codepoint used only for those chars that are beyond printable/basic ASCII
     1634    public static void elementToLogAsUnicodeDebugString(String prefix, Element e, boolean indent)
     1635    {
     1636        elementToLogAsString(prefix, e, indent, true);
     1637    }
     1638   
     1639    public static void elementToLogAsString(String prefix, Element e, boolean indent)
     1640    {
     1641        elementToLogAsString(prefix, e, indent, false);
    16311642    }
    16321643
    16331644    // pass in debugEncoding=true to investigate encoding issues. This function will then return non-basic ASCII characters in hex
    1634     public static String elementToString(Element e, boolean indent, boolean debugEncoding)
     1645    private static String elementToString(Element e, boolean indent, boolean debugEncoding)
    16351646    {
    16361647        String str = "";
     
    16521663           
    16531664            // if debugging encoding issues, then encode unicode code pts as hex for all but non-alphanumeric and space/tab/newline chars
    1654             if(debugEncoding) str = Misc.stringToHex(str);
     1665            if(debugEncoding) str = Misc.debugUnicodeString(str);
    16551666        }
    16561667        catch (Exception ex)
     
    16671678    {
    16681679        return elementToString(e, indent, false);
     1680    }
     1681   
     1682    // hex/unicode codepoint used only for those chars that are beyond printable/basic ASCII
     1683    public static String elementToUnicodeDebugString(Element e, boolean indent)
     1684    {
     1685        return elementToString(e, indent, true);
    16691686    }
    16701687   
Note: See TracChangeset for help on using the changeset viewer.