Ignore:
Timestamp:
2019-05-03T21:46:42+12:00 (5 years ago)
Author:
ak19
Message:
  1. Dr Bainbridge's fix for the Windows encoding issue when online doc editing Title meta containing non-ASCII/non-basic unicode chars, was to URL encode the QUERY_STRING so that CGI.pm would then do the right thing. This was because the problem had turned out not to be env vars on Windows, which could set and recall unicode and win-1252 chars just fine and therefore retained what was entered. The problem was that on Windows, the perl did not get the actual chars transmitted in the case of UTF-8 whereas win-1252 was received looking apparently like a unicode codepoint, but then in the latter case the utf82unicode call in metadataaction would then clobber the codepoint in attempting to do utf82unicode on it. On linux, perl happened to receive the chars as utf8-encoded bytes and so utf82unicode worked (to make them unicode aware strings?). The real problem was that it could go wrong in different ways on windows, since utf8 chars weren't even received properly by perl/CGI, so we didn't even need to start worrying about them getting sometimes clobbered in metadataaction. URL encoding the QUERY_STRING was meant to solve this. Except that URL encoding the whole QUERY_STRING made CGI.pm choke on the equals signs between param name and param value and possibly other chars. I don't know why. I found that URL encoding just the param values in the QUERY_STRING works, so I am committing that. 2. Renaming the recently introduced string2hex() in JavaScript to debug_unicode_string and stringToHex() in Java to debugUnicodeString() to be more consistent with the perl variant, debug_unicode_string. Also like in the perl, the JavaScript and Java now print the unicode value inside curly braces for better legibility. 3. Leaving in some commented out encoding debugging statements in the Java and JavaScript code, but not committing the debugging on the perl side. 4. Some further improvements to overloaded functions in GSXML using debug_unicode_string for converting XML elements or printing them to logs.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone3/src/java/org/greenstone/gsdl3/service/GS2Construct.java

    r32892 r33045  
    2323import java.io.FileWriter;
    2424import java.io.Serializable;
     25import java.io.UnsupportedEncodingException;
     26import java.net.URLEncoder;
     27import java.nio.charset.StandardCharsets;
    2528import java.util.Collections;
    2629import java.util.Iterator;
     
    746749
    747750        Element param_list = (Element) GSXML.getChildByTagName(request, GSXML.PARAM_ELEM + GSXML.LIST_MODIFIER);
     751       
     752        //GSXML.elementToLogAsString("###      Extracted param_list: ", param_list, true);
     753        //GSXML.elementToLogAsUnicodeDebugString("### DEBUG Extracted param_list: ", param_list, true);
     754       
    748755        HashMap<String, Serializable> params = GSXML.extractParams(param_list, false);
    749756
     
    838845                String paramvalue = (String) entry.getValue();
    839846
    840                 querystring.append(paramname + "=" + paramvalue);
     847                // And need to ensure that special characters won't get clobbered on Windows by perl/CGI.pm (https://www.nntp.perl.org/group/perl.perl5.porters/2016/10/msg240120.html),
     848                // URL encode the query_string, as at https://stackoverflow.com/questions/10786042/java-url-encoding-of-query-string-parameters
     849       
     850                // perl/CGI.pm doesn't like us URL encoding the entire query string such as the equal sign between each paramName and paramValue.
     851                // So we URL encode each paramValue separately, which is done in GS2Construct.java::runCommand()
     852                querystring.append(paramname + "=" + urlEncodeValue(paramname, paramvalue));
    841853                if (i.hasNext()) {
    842854                    querystring.append("&");
     
    882894    //************************
    883895
     896    private String urlEncodeValue(String paramName, String paramVal) {
     897        String oldParamVal = paramVal;
     898        try{
     899            paramVal = URLEncoder.encode(paramVal, StandardCharsets.UTF_8.name());         
     900        } catch(UnsupportedEncodingException uee) {
     901            logger.warn("**** Unable to encode query_string param " + paramName + " in UTF-8, so attempting to continue with its unencoded value."); // don't output param value to log, in case of sensitive data?
     902            paramVal = oldParamVal;
     903        }
     904        return paramVal;
     905    }
     906   
    884907    /** parse the collect directory and return a list of collection names */
    885908    protected String[] getCollectionList()
Note: See TracChangeset for help on using the changeset viewer.