Ignore:
Timestamp:
2019-05-03T21:46:42+12:00 (5 years ago)
Author:
ak19
Message:
  1. Dr Bainbridge's fix for the Windows encoding issue when online doc editing Title meta containing non-ASCII/non-basic unicode chars, was to URL encode the QUERY_STRING so that CGI.pm would then do the right thing. This was because the problem had turned out not to be env vars on Windows, which could set and recall unicode and win-1252 chars just fine and therefore retained what was entered. The problem was that on Windows, the perl did not get the actual chars transmitted in the case of UTF-8 whereas win-1252 was received looking apparently like a unicode codepoint, but then in the latter case the utf82unicode call in metadataaction would then clobber the codepoint in attempting to do utf82unicode on it. On linux, perl happened to receive the chars as utf8-encoded bytes and so utf82unicode worked (to make them unicode aware strings?). The real problem was that it could go wrong in different ways on windows, since utf8 chars weren't even received properly by perl/CGI, so we didn't even need to start worrying about them getting sometimes clobbered in metadataaction. URL encoding the QUERY_STRING was meant to solve this. Except that URL encoding the whole QUERY_STRING made CGI.pm choke on the equals signs between param name and param value and possibly other chars. I don't know why. I found that URL encoding just the param values in the QUERY_STRING works, so I am committing that. 2. Renaming the recently introduced string2hex() in JavaScript to debug_unicode_string and stringToHex() in Java to debugUnicodeString() to be more consistent with the perl variant, debug_unicode_string. Also like in the perl, the JavaScript and Java now print the unicode value inside curly braces for better legibility. 3. Leaving in some commented out encoding debugging statements in the Java and JavaScript code, but not committing the debugging on the perl side. 4. Some further improvements to overloaded functions in GSXML using debug_unicode_string for converting XML elements or printing them to logs.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone3/src/java/org/greenstone/util/Misc.java

    r33043 r33045  
    5858   
    5959   
    60     // Debugging function to print a string's non-basic chars in hex
     60    // Debugging function to print a string's non-basic chars in hex, so stringToHex on all non-basic and non-printable ASCII
     61    // Dr Bainbridge said that printing anything with charCode over 128 in hex is okay, but I'd already made extra allowances for non-printable ASCII
    6162    // Based on https://stackoverflow.com/questions/923863/converting-a-string-to-hexadecimal-in-java
    62     public static String stringToHex(String str) {
     63    public static String debugUnicodeString(String str) {
    6364      String result = "";
    6465      for(int i = 0; i < str.length(); i++) {
     
    6768            // ASCII table: https://cdn.sparkfun.com/assets/home_page_posts/2/1/2/1/ascii_table_black.png
    6869            // If the unicode character code pt is less than the ASCII code for space and greater than for tilda, let's display the char in hex (x0000 format)
    69             if((charCode >= 20 && charCode <= 126) || charCode == 9 || charCode == 10 || charCode == 13) { // space to tilda, TAB, LF, CR are printable
     70            if((charCode >= 20 && charCode <= 126) || charCode == 9 || charCode == 10 || charCode == 13) { // space, tilda, TAB, LF, CR are printable, leave them in for XML element printing
    7071                result += str.charAt(i);
    7172            } else {
    72                 result += "x" + String.format("%04x", charCode);
     73                result += "x{" + String.format("%04x", charCode) + "}"; // looks like: x{4-char-codepoint}
    7374            }
    7475      }
Note: See TracChangeset for help on using the changeset viewer.