Changeset 31713 for main

Show
Ignore:
Timestamp:
29.05.2017 19:45:42 (3 years ago)
Author:
ak19
Message:

Further filling in the sections of the Readme doc on SafeProcess?

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • main/trunk/gli/src/org/greenstone/gatherer/util/Readme_Using_SafeProcess.txt

    r31711 r31713  
    156156The exit value will also be -1 if you call getExitValue() before setting off the SafeProcess via any of the runProcess() variants. 
    157157 
    158 _______________________________________________________________________ 
    159 INTERNAL IMPLEMENTATION DETAILS - USEFUL IF CUSTOMISING OR CANCELLING 
    160 _______________________________________________________________________ 
    161  
    162 doRuntimeExec 
    163 waitForStreams() 
    164 joins() discussion 
     158_________________________________________________________________________________________________________________ 
     159INTERNAL IMPLEMENTATION DETAILS - USEFUL IF CUSTOMISING SAFEPROCESS' BEHAVIOUR OR CALLING CANCEL ON SAFEPROCESS 
     160_________________________________________________________________________________________________________________ 
     161 
     162This section is for understanding how SafeProcess runs a Java Process internally. These methods are not public and their details are hidden away in the code. 
     163 
     164Calling any of the runProcess() variants does the following internally: 
     165 
     1661. calls method doRuntimeExec() 
     167 
     168    based on how you configured your SafeProcess instance, this method instantiates and returns a Process object by calling Runtime.exec(). 
     169    A possible improvement may be to consider using the ProcessBuilder class for creating a Process object. 
     170 
     1712. calls method waitForStreams() 
     172 
     173    Called by all but the runBasicProcess() variant, since the runBasicProcess() variant does not do anything with the Process' iostreams and therefore doesn't use worker threads . 
     174    This method first starts off all the worker threads, to handle each input or output stream of the Process. 
     175    Then, it calls waitFor() on the Process object. This is the first of 2 calls that block the primary thread (the Thread the Safeprocess runs its Process object in) until finished. 
     176 
     177    One of two things happen,  
     178    - either the internal Process is allowed to terminate naturally, reaching the end. 
     179    - Or the Process is interrupted by a cancel action, which will result in an InterruptedException. When an InterruptedException is caught, each of the worker threads is interrupted. The worker threads check for the interrupt flag during any loop, so that they may exit their loop early. If an InterruptException happened, the exitValue after the Process' waitFor() call remains at -1 and the SafeProcess instance sets its boolean forciblyTerminateProcess flag to true, to indicate that it may have to call Process.destroy() and ensure any subprocesses are terminated. 
     180     
     181    Regardless of whether the Process naturally terminates or is prematurely terminated by an InterruptedException upn a cancel, 
     182    - all the resources such as streams are closed. 
     183    - the join() method is called on all worker threads. As per the Java API, callin join() on a Thread "Waits for this thread to die." This means the primary thread now waits for the worker threads to all come to a halt. So the section where join() is called on each worker thread also blocks the primary thread until the joins are complete. 
     184 
     185Since both the Process.waitFor() phase and the join() phase block the primary thread until finished, an InterruptedException could theoretically have occurred during either phase. However, we don't want an InterruptedException occurring during the join() phase, as an interrupt at this point will break us out of the join (going into an exception handler for the Interruption, prematurely ending the join()) and thus the worker threads would have been prevented from neatly cleaning up after themselves. To prevent InterruptedExceptions from occurring during any of the join() calls on the worker threads, the join() calls are embedded in a section that is marked as not interruptible. Once the join calls are finished, the code in the waitForStreans() method becomes interruptible again and a notify() is sent to anyone that was waiting for the thread to become interruptible. (Technically, the remainder of the code, the rest of the primary thread, does not respond to InterruptedExceptions hereafter, as there are no more waiting/blocking calls after the join()s are over.) 
     186 
     187Marking the join() phase as an uninterruptible section of code helps when the cancelRunProcess() is called on the SafeProcess instance by a separate thread, such as a GUI thread with a cancel button that triggered the call to cancelRunProcess(). The cancelRunProcess() method only interrupts the primary SafeProcess thread if it is in an interruptible section of code. If the primary thread is uninterruptible when cancelRunProcess() was called, then the method either 
     188- returns immediately if called by a GUI thread (EventDispatchThread), 
     189- or if cancelRunProcess(boolean forceWaitUntilInterruptible) is called with forceWaitUntilInterruptible, even if the caller is the EventDispatchThread, it will wait until the primary thread is interruptible before it returns from the cancelRunProcess() method. The wait() call blocks until notify()ed that the primary thread has become interruptible again. 
     190 
     191The cancelRunningProcess() method, which is called by a Thread external to the primary SafeProcess thread, therefore only causes an InterruptedException on the primary SafeProcess thread. It never sends an interruption during or after the primary thread is in an uninterruptible section, since only the waitFor() phase is ever to be interrupted, as waitFor() is the only phase when the external process is actually running, while the subsequent phases are already in closing down naturally and only need to do cleanup thereafter. If cancelRunningProcess() is called when the primary thread is in the uninterruptible phase after waitFor(), the cancelRunningProcess() method returns straight away or can be told to wait until the uninterupptible phase is done. Either way, it won't cause an interrupt any more, as the process is already naturally closing down. 
     192 
     193 
     1943. If and only if SafeProcess was cancelled, so that its Process is to be prematurely terminated, does SafeProcess call destroyProcess(). (It is not called on natural termination.) destroyProcess() ensures both the external process that is run, denoted by the Process object, and also any subprocesses that are launched by the external process are all terminated. 
     195 
     196- On Linux (only tested on Ubuntu), it was found that the simple sequence of events from interrupting the worker threads to the join()s called on them (which closed off al resources used by the worker threads) was sufficient to cleanly end the Process and any subprocesses. 
     197 
     198- On Windows, it needs to call Process.destroy() to terminate the Process and further needs to obtain the childids of any subprocesses this launched and terminate them recursively in a Windows-specific manner by calling the Windows specific kill command on each process id.  
     199The method "public static long getProcessID(Process p)" uses the newly included Java Native Access library to obtain the processID of any Process. Then the childpids of any process denoted by pid are obtained with the command: 
     200 
     201    "wmic process where (parentprocessid=<pid>) get processid" 
     202 
     203Having the pids of all processes to be terminated, one of "wmic process <pid> delete", "taskkill /f /t /PID <pid>" or "tskill pid" commands is invoked, whichever is available, to recursively and forcibly terminate any process and its subprocesses, by their pid. The taskkill command works like kill -9 (kill -KILL) rather than like kill -TERM. 
     204 
     205The wmic prompt-like tool, which has been present since Windows XP, not only recongises the command to delete a process by pid, but it also has commands to get the childids of a process denoted by pid, and vice-versa (to find the parent pid if you know a childpid). Much more is possible with wmic. The tskill command has also been available for a long time, taskkill is newer and may have appeared with Windows Vista or later. 
     206 
     207- On Mac, pressing cancel when building a collection in GLI launches the perl processes full-import.pl followed by full-buildcol.pl. The buildcol phase terminates even without a process.destroy() being necessary, which is as on Linux. However, for full-import.pl, process.destroy() firstly is necessary: as always, process.destroy() is able to destroy the Process that was launched (full-import.pl), but not any subprocesses that this launched in turn (such as import.pl). So on Mac, during destroyProcess(), an operating system command is issued to terminate the process and its subprocesses. See the NOTE at the end of the section. 
     208 
     209On Linux, SafeProcess() does not issue operating system commands to terminate the internal Process upon cancel, since on Linux both a process and its subprocesses seem to come to an end on their own in such a case. Nevertheless, SafeProcess now also includes code to send an OS level command to terminate a process with its subprocesses on Linux too. The way SafeProcess works on premature termination, with interrupts and closing of resources, doesn't yet require an OS level command. However, there's now a variant of destroyProcess() which takes a boolean parameter which, if false, will force an OS level command to be issued on Linux too if an entire process group (process and subprocesses) is to be destroyed. 
     210 
     211On Mac, the command is "pkill -TERM -P <pid>". This is to kill a process and subprocesses. It was tested successfully on Mac Snow Leopard. 
     212 
     213However, on Linux, this command only terminates the top level process, while any subprocesses are left running. On Linux, the OS level command to terminate the entire process group is "kill -TERM -<pid>", which has been tested successfully on Ubuntu. Note the hyphen in front of the <pid>, which treats the pid as a process group id, and terminates the entire process tree. (In contrast, the more common "kill -TERM <pid>" only terminates the top level process on Mac and Linux, leaving subprocesses running). 
     214 
     215For more information, see https://stackoverflow.com/questions/8533377/why-child-process-still-alive-after-parent-process-was-killed-in-linux 
     216 
     217NOTE: 
     218This note documents weird behaviour on Mac, which has not been explained yet, but which the above operating system commands to terminate the process group successfully taken care of anyway. 
     219 
     220When running the full-import.pl and full-buildcol.pl in sequence from the command line on a Mac, the regular "kill -TERM <pid> signal" only terminates the top level process and not subprocesses. To kill the entire process group, the OS specific command for killing a process (mentioned above) must be issued instead. 
     221When full-import.pl is run from GLI, both process.destroy() and the regular "kill -TERM <pid> signal" are able to successfully terminate the top level process, but not subprocesses. In this, sending the TERM signal results in behaviour similar to the situation with running full-import.pl from the command line. 
     222 
     223However, when full-buildcol.pl is run from GLI on a Mac the behaviour is inconsistent with the command line and with running full-import.pl from GLI. When full-buildcol.pl is run from GLI, the Process and subprocesses all terminate tidily on their own when cancel is pressed, not requiring either process.destroy() or the OS command to terminate even the top level process ID, nor requiring the command to terminate the process group either. This behaviour is similar to how both full-import.pl and full-buildcol.pl scripts come to a clean close upon cancel on Linux. In such cases, sending a TERM signal to the process (group) id returns an exitvalue of 1. This exitvalue can mean that the kill signal didn't work, but in these cases, it specifically indicates that the process was already terminated before the TERM signal was even sent to kill the process. 
     224 
     225A specific detail of the process of killing a process group on Linux and Mac is that a TERM signal is sent first. If the return value was 0, a signal was successfully sent. If it is 1 it failed in some way, including the possibility that process was already terminated. If neither of these, then it's assumed the return value indicates the termination signal was unsuccessful, and a KILL signal is sent next.  
     226 
     227Preliminary investigation of the weird behaviour on Mac has revealed the following: 
     228a. Sending a Ctrl-C is different in two ways from sending a kill -TERM signal. 
     229- Ctrl-C is "kill -2" or "kill -INT", whereas "kill -TERM" is the same as "kill -15" (and "kill -KILL" is "kill -9", which doesn't allow the killed process to do cleanup first). 
     230- Ctrl-C kills the foreground process of the progress group targeted by the Ctrl-C (or the process group indicated by pid). In contrast, kill -TERM or kill -KILL terminate the process with the given pid, which is likely to be a background process if any subprocesses were launched by it. 
     231 
     232b. Pressing Ctrl-C when full-import.pl is running may give the impression of having terminated the entire process group if you happen to issue the Ctrl-C at a point when full-import.pl hasn't launched any subprocesses. The parent process is killed, and therefore can't launch further subprocesses, so it looks like hte entire process tree was successfully terminated. However, if you issue Ctrl-C when a subprocess is running, that subprocess (the foreground process) is all that's terminated, while background processes are still running and can continue to launch subprocesses. 
     233 
     234Sending a regular kill -TERM or kill -KILL signal to the parent process of a process group will kill the parent process (but not the process group, unless you explicitly issued the command to terminate an entire process group). Killing the parent process will prevent future subprocesses from being launched by the parent process, but child processes remain runnning. 
     235 
     236The Java method Process.destroy(), used by GLI until the recent modifications to ensure the entire process group is terminated, behaves more like kill -TERM/kill -KILL than Ctrl-C: the actuall parent Process launched with Runtime.exec() is killed, but not subprocesses. This means that Ctrl-C is not a proper test of GLI's behaviour. 
     237 
     238Future investigation into the weird behaviour, if this ever needs to be explained, can proceed from the above. At present. issuing operating system commands to kill an entire process tree seem to work sufficiently in all cases tested, despite the weird distinctive behaviour of full-buildcol.pl on a Mac when run from GLI. 
    165239 
    166240__________________________________________________ 
     
    368442USEFUL SYNCHRONISATION NOTES 
    369443__________________________________________________ 
    370  
     444[Also add in links] 
     445 
     446 
     447__________________________________________________ 
     448TICKETS, COMMITS, TAGS 
     449__________________________________________________ 
     450 
     451