30.08.2017 19:29:36 (2 years ago)

Untested on Windows as yet. 1. Major overhaul to WgetDownload?'s useWget() and useWgetMonitored() subroutines. Their use of open3 was wrong and would cause blocking if proxy set wrong or if https_proxy not set/set wrong and the url entered was http but resolves to https. The problem was more fundamental than the symptoms indicated the open3() calls were used wrong and resulted in blocking. The blocking could be indefinite. To generally avoid blocking, needed to use IO::select() to loop to check any child streams that are ready. To avoid possibly indefinite blocking, needed to use IO::select() with a timeout on the can_read() method. The need for all these and their use is indicated in the links added to the committed version of this module. 2. After the use of select() worked in principle, there was still the large problem that terminating unnaturally did not stop a second wget that had been launched. This unexpectedly had to do with doublequotes around wget's path that attempted to preserve any spaces in the path, but which behaved differently with open3(): any double quotes launched a subshell to run the command passed to open3(). And the wget cmd launched by the subshell cmd wasn't actually a child process, so it could not be terminated via the parentpid used as a processgrouppid when doing the kill TERM -processgroupid. The solution lay with the unexpected cause of the problem, which was the double quotes. Now the command passed to open3() is an array of parameters and no double quotes. The array is meant to preserve spaces in any filepaths. 3. Removed the 2 tries parameter passed to wget, since we now loop a certain number of times trying to read from the child process' streams each time this times out. If it times out n times, then we give up and assume that the URL could not be read.

1 modified


  • main/trunk/greenstone2/perllib/downloaders/

    r31898 r31920  
    118118    } 
    119119    #my $cmdWget = "-N -k -x -t 2 -P \"".$hashGeneralOptions->{"cache_dir"}."\" $strWgetOptions $strOptions ".$self->{'url'}; 
    120     my $cmdWget = "-N -k -x --tries=2 $strWgetOptions $strOptions $cache_dir " .$self->{'url'}; 
     120    #my $cmdWget = "-N -k -x --tries=2 $strWgetOptions $strOptions $cache_dir " .$self->{'url'}; 
     121    my $cmdWget = "-N -k -x $strWgetOptions $strOptions $cache_dir " .$self->{'url'}; 
    122123    #print STDOUT "\n@@@@ RUNNING WGET CMD: $cmdWget\n\n"; 
    191192    my $strOptions = $self->getWgetOptions(); 
    193     my $strBaseCMD = $strOptions." --tries=2 -q -O - \"$self->{'url'}\""; 
     194    #my $strBaseCMD = $strOptions." --tries=2 -q -O - \"$self->{'url'}\""; 
     195    my $strBaseCMD = $strOptions." -q -O - $self->{'url'}"; 
    195197    #&util::print_env(STDERR, "https_proxy", "http_proxy", "ftp_proxy");