2017-05-29T19:45:42+12:00 (7 years ago)

Further filling in the sections of the Readme doc on SafeProcess

1 edited


  • main/trunk/gli/src/org/greenstone/gatherer/util/Readme_Using_SafeProcess.txt

    r31711 r31713  
    156156The exit value will also be -1 if you call getExitValue() before setting off the SafeProcess via any of the runProcess() variants.
    158 _______________________________________________________________________
    160 _______________________________________________________________________
    162 doRuntimeExec
    163 waitForStreams()
    164 joins() discussion
     162This section is for understanding how SafeProcess runs a Java Process internally. These methods are not public and their details are hidden away in the code.
     164Calling any of the runProcess() variants does the following internally:
     1661. calls method doRuntimeExec()
     168    based on how you configured your SafeProcess instance, this method instantiates and returns a Process object by calling Runtime.exec().
     169    A possible improvement may be to consider using the ProcessBuilder class for creating a Process object.
     1712. calls method waitForStreams()
     173    Called by all but the runBasicProcess() variant, since the runBasicProcess() variant does not do anything with the Process' iostreams and therefore doesn't use worker threads .
     174    This method first starts off all the worker threads, to handle each input or output stream of the Process.
     175    Then, it calls waitFor() on the Process object. This is the first of 2 calls that block the primary thread (the Thread the Safeprocess runs its Process object in) until finished.
     177    One of two things happen,
     178    - either the internal Process is allowed to terminate naturally, reaching the end.
     179    - Or the Process is interrupted by a cancel action, which will result in an InterruptedException. When an InterruptedException is caught, each of the worker threads is interrupted. The worker threads check for the interrupt flag during any loop, so that they may exit their loop early. If an InterruptException happened, the exitValue after the Process' waitFor() call remains at -1 and the SafeProcess instance sets its boolean forciblyTerminateProcess flag to true, to indicate that it may have to call Process.destroy() and ensure any subprocesses are terminated.
     181    Regardless of whether the Process naturally terminates or is prematurely terminated by an InterruptedException upn a cancel,
     182    - all the resources such as streams are closed.
     183    - the join() method is called on all worker threads. As per the Java API, callin join() on a Thread "Waits for this thread to die." This means the primary thread now waits for the worker threads to all come to a halt. So the section where join() is called on each worker thread also blocks the primary thread until the joins are complete.
     185Since both the Process.waitFor() phase and the join() phase block the primary thread until finished, an InterruptedException could theoretically have occurred during either phase. However, we don't want an InterruptedException occurring during the join() phase, as an interrupt at this point will break us out of the join (going into an exception handler for the Interruption, prematurely ending the join()) and thus the worker threads would have been prevented from neatly cleaning up after themselves. To prevent InterruptedExceptions from occurring during any of the join() calls on the worker threads, the join() calls are embedded in a section that is marked as not interruptible. Once the join calls are finished, the code in the waitForStreans() method becomes interruptible again and a notify() is sent to anyone that was waiting for the thread to become interruptible. (Technically, the remainder of the code, the rest of the primary thread, does not respond to InterruptedExceptions hereafter, as there are no more waiting/blocking calls after the join()s are over.)
     187Marking the join() phase as an uninterruptible section of code helps when the cancelRunProcess() is called on the SafeProcess instance by a separate thread, such as a GUI thread with a cancel button that triggered the call to cancelRunProcess(). The cancelRunProcess() method only interrupts the primary SafeProcess thread if it is in an interruptible section of code. If the primary thread is uninterruptible when cancelRunProcess() was called, then the method either
     188- returns immediately if called by a GUI thread (EventDispatchThread),
     189- or if cancelRunProcess(boolean forceWaitUntilInterruptible) is called with forceWaitUntilInterruptible, even if the caller is the EventDispatchThread, it will wait until the primary thread is interruptible before it returns from the cancelRunProcess() method. The wait() call blocks until notify()ed that the primary thread has become interruptible again.
     191The cancelRunningProcess() method, which is called by a Thread external to the primary SafeProcess thread, therefore only causes an InterruptedException on the primary SafeProcess thread. It never sends an interruption during or after the primary thread is in an uninterruptible section, since only the waitFor() phase is ever to be interrupted, as waitFor() is the only phase when the external process is actually running, while the subsequent phases are already in closing down naturally and only need to do cleanup thereafter. If cancelRunningProcess() is called when the primary thread is in the uninterruptible phase after waitFor(), the cancelRunningProcess() method returns straight away or can be told to wait until the uninterupptible phase is done. Either way, it won't cause an interrupt any more, as the process is already naturally closing down.
     1943. If and only if SafeProcess was cancelled, so that its Process is to be prematurely terminated, does SafeProcess call destroyProcess(). (It is not called on natural termination.) destroyProcess() ensures both the external process that is run, denoted by the Process object, and also any subprocesses that are launched by the external process are all terminated.
     196- On Linux (only tested on Ubuntu), it was found that the simple sequence of events from interrupting the worker threads to the join()s called on them (which closed off al resources used by the worker threads) was sufficient to cleanly end the Process and any subprocesses.
     198- On Windows, it needs to call Process.destroy() to terminate the Process and further needs to obtain the childids of any subprocesses this launched and terminate them recursively in a Windows-specific manner by calling the Windows specific kill command on each process id.
     199The method "public static long getProcessID(Process p)" uses the newly included Java Native Access library to obtain the processID of any Process. Then the childpids of any process denoted by pid are obtained with the command:
     201    "wmic process where (parentprocessid=<pid>) get processid"
     203Having the pids of all processes to be terminated, one of "wmic process <pid> delete", "taskkill /f /t /PID <pid>" or "tskill pid" commands is invoked, whichever is available, to recursively and forcibly terminate any process and its subprocesses, by their pid. The taskkill command works like kill -9 (kill -KILL) rather than like kill -TERM.
     205The wmic prompt-like tool, which has been present since Windows XP, not only recongises the command to delete a process by pid, but it also has commands to get the childids of a process denoted by pid, and vice-versa (to find the parent pid if you know a childpid). Much more is possible with wmic. The tskill command has also been available for a long time, taskkill is newer and may have appeared with Windows Vista or later.
     207- On Mac, pressing cancel when building a collection in GLI launches the perl processes full-import.pl followed by full-buildcol.pl. The buildcol phase terminates even without a process.destroy() being necessary, which is as on Linux. However, for full-import.pl, process.destroy() firstly is necessary: as always, process.destroy() is able to destroy the Process that was launched (full-import.pl), but not any subprocesses that this launched in turn (such as import.pl). So on Mac, during destroyProcess(), an operating system command is issued to terminate the process and its subprocesses. See the NOTE at the end of the section.
     209On Linux, SafeProcess() does not issue operating system commands to terminate the internal Process upon cancel, since on Linux both a process and its subprocesses seem to come to an end on their own in such a case. Nevertheless, SafeProcess now also includes code to send an OS level command to terminate a process with its subprocesses on Linux too. The way SafeProcess works on premature termination, with interrupts and closing of resources, doesn't yet require an OS level command. However, there's now a variant of destroyProcess() which takes a boolean parameter which, if false, will force an OS level command to be issued on Linux too if an entire process group (process and subprocesses) is to be destroyed.
     211On Mac, the command is "pkill -TERM -P <pid>". This is to kill a process and subprocesses. It was tested successfully on Mac Snow Leopard.
     213However, on Linux, this command only terminates the top level process, while any subprocesses are left running. On Linux, the OS level command to terminate the entire process group is "kill -TERM -<pid>", which has been tested successfully on Ubuntu. Note the hyphen in front of the <pid>, which treats the pid as a process group id, and terminates the entire process tree. (In contrast, the more common "kill -TERM <pid>" only terminates the top level process on Mac and Linux, leaving subprocesses running).
     215For more information, see https://stackoverflow.com/questions/8533377/why-child-process-still-alive-after-parent-process-was-killed-in-linux
     218This note documents weird behaviour on Mac, which has not been explained yet, but which the above operating system commands to terminate the process group successfully taken care of anyway.
     220When running the full-import.pl and full-buildcol.pl in sequence from the command line on a Mac, the regular "kill -TERM <pid> signal" only terminates the top level process and not subprocesses. To kill the entire process group, the OS specific command for killing a process (mentioned above) must be issued instead.
     221When full-import.pl is run from GLI, both process.destroy() and the regular "kill -TERM <pid> signal" are able to successfully terminate the top level process, but not subprocesses. In this, sending the TERM signal results in behaviour similar to the situation with running full-import.pl from the command line.
     223However, when full-buildcol.pl is run from GLI on a Mac the behaviour is inconsistent with the command line and with running full-import.pl from GLI. When full-buildcol.pl is run from GLI, the Process and subprocesses all terminate tidily on their own when cancel is pressed, not requiring either process.destroy() or the OS command to terminate even the top level process ID, nor requiring the command to terminate the process group either. This behaviour is similar to how both full-import.pl and full-buildcol.pl scripts come to a clean close upon cancel on Linux. In such cases, sending a TERM signal to the process (group) id returns an exitvalue of 1. This exitvalue can mean that the kill signal didn't work, but in these cases, it specifically indicates that the process was already terminated before the TERM signal was even sent to kill the process.
     225A specific detail of the process of killing a process group on Linux and Mac is that a TERM signal is sent first. If the return value was 0, a signal was successfully sent. If it is 1 it failed in some way, including the possibility that process was already terminated. If neither of these, then it's assumed the return value indicates the termination signal was unsuccessful, and a KILL signal is sent next.
     227Preliminary investigation of the weird behaviour on Mac has revealed the following:
     228a. Sending a Ctrl-C is different in two ways from sending a kill -TERM signal.
     229- Ctrl-C is "kill -2" or "kill -INT", whereas "kill -TERM" is the same as "kill -15" (and "kill -KILL" is "kill -9", which doesn't allow the killed process to do cleanup first).
     230- Ctrl-C kills the foreground process of the progress group targeted by the Ctrl-C (or the process group indicated by pid). In contrast, kill -TERM or kill -KILL terminate the process with the given pid, which is likely to be a background process if any subprocesses were launched by it.
     232b. Pressing Ctrl-C when full-import.pl is running may give the impression of having terminated the entire process group if you happen to issue the Ctrl-C at a point when full-import.pl hasn't launched any subprocesses. The parent process is killed, and therefore can't launch further subprocesses, so it looks like hte entire process tree was successfully terminated. However, if you issue Ctrl-C when a subprocess is running, that subprocess (the foreground process) is all that's terminated, while background processes are still running and can continue to launch subprocesses.
     234Sending a regular kill -TERM or kill -KILL signal to the parent process of a process group will kill the parent process (but not the process group, unless you explicitly issued the command to terminate an entire process group). Killing the parent process will prevent future subprocesses from being launched by the parent process, but child processes remain runnning.
     236The Java method Process.destroy(), used by GLI until the recent modifications to ensure the entire process group is terminated, behaves more like kill -TERM/kill -KILL than Ctrl-C: the actuall parent Process launched with Runtime.exec() is killed, but not subprocesses. This means that Ctrl-C is not a proper test of GLI's behaviour.
     238Future investigation into the weird behaviour, if this ever needs to be explained, can proceed from the above. At present. issuing operating system commands to kill an entire process tree seem to work sufficiently in all cases tested, despite the weird distinctive behaviour of full-buildcol.pl on a Mac when run from GLI.
     444[Also add in links]
Note: See TracChangeset for help on using the changeset viewer.