Changeset 31507


Ignore:
Timestamp:
2017-03-13T19:48:56+13:00 (7 years ago)
Author:
ak19
Message:

BUGFIX to servercontrol.pm. servercontrol::config() was merging stderr and stdout of wget command in order to work out response code, response message (both going to stderr) and html page's text string (goes to stdout) in order to parse the ping response. This worked fine all the times I'd tested it before, such as some months back when I tested the incremental build tutorial. But the merge of stderr and stdout failed today and showed how bad the idea to merge the two was: the very line in the HTML string from STDOUT that was being parsed and compared against an expected value, was interspersed with output from stderr. So the regex didn't match and ultimately the collection was assumed deactivated when activated and vice-versa. Two fixes attempted and committing the fix that worked: the wgeet command stores the downloaded HTML to a file named by timestamp and deleted as soon as read. The failed attempt was to use open3, but there were warnings in the perl online manual about the dangers blocking when attempting to read from stderr and stdout streams, and I'm not sure if this is what I encountered, but I decided against it and returned to using the successful file version of the fix.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/servercontrol.pm

    r31488 r31507  
    140140
    141141    my $wget_file_path = &FileUtils::filenameConcatenate($ENV{'GSDLHOME'}, "bin", $ENV{'GSDLOS'}, "wget");
    142 
     142    my $tmpfilename = time . ".html"; # random name for file wherein we'll store the HTML page retrieved by wget
     143   
    143144    # https://www.gnu.org/software/wget/manual/wget.html
    144145    # output-document set to - (STDOUT), so page is streamed to STDOUT
     
    147148    # Searching for "perl backtick operator redirect stderr to stdout":
    148149    # http://www.perlmonks.org/?node=How%20can%20I%20capture%20STDERR%20from%20an%20external%20command%3F
    149     $wgetCommand = "\"$wget_file_path\" --output-document=- -T 5 -t 1 \"$library_url$wgetCommand\" 2>&1";   
    150     #$wgetCommand = "\"$wget_file_path\" --spider -T 5 -t 1 \"$library_url$wgetCommand\" 2>&1"; # won't save page
     150    ##$wgetCommand = "\"$wget_file_path\" --spider -T 5 -t 1 \"$library_url$wgetCommand\" 2>&1"; # won't save page
     151    #$wgetCommand = "\"$wget_file_path\" --output-document=- -T 5 -t 1 \"$library_url$wgetCommand\" 2>&1"; # THIS CAN MIX UP STDERR WITH STDOUT IN THE VERY LINE WE REGEX TEST AGAINST EXPECTED OUTPUT!!
     152    $wgetCommand = "\"$wget_file_path\" --output-document=$tmpfilename -T 5 -t 1 \"$library_url$wgetCommand\" 2>&1"; # keep stderr (response code, response_content) separate from html page content
     153   
    151154    ##print STDERR "@@@@ $wgetCommand\n";
    152155
     
    185188       
    186189        # check the page content is as expected
    187         my $resultstr = $response_content;
     190        #my $resultstr = $response_content;
     191       
     192        open(FIN,"<$tmpfilename") or die "servercontrol.pm: Unable to open $tmpfilename to read ping response page...ERROR: $!\n";
     193        my $resultstr;
     194        # Read in the entire contents of the file in one hit
     195        sysread(FIN, $resultstr, -s FIN);       
     196        close(FIN);
     197        &FileUtils::removeFiles("$tmpfilename");
     198       
    188199        #$resultstr =~ s@.*gs_content\"\>@@s;   ## only true for default library servlet   
    189200        #$resultstr =~ s@</div>.*@@s;
Note: See TracChangeset for help on using the changeset viewer.