Apache Child Processes Lifecycle and Request Processing

Apache Child Process Lifecycle

Apache Rivet delegates to the Multi-Processing Module (MPM) the task of managing the agents responding to network requests. An MPM is responsible for creating such agents during the start-up, and is in charge for terminating existing ones and recreating new agents when the workload is requiring it.

Apache Rivet is currently supporting only the prefork MPM which creates full fledged child processes as independent agents responding to network requests. Efforts are under way to extend the support to the worker MPM, a hybrid model where forked child processes in turn create threads as real network agents. If we can achieve this the goal would open the possibility of supporting also the Windows© specific winnt MPM, where a single process creates and manages a large number of thread agents.

Configuration parameters about this critical point can be read in the Apache documentation.

There are 4 stages in the lifetime of an Apache webserver that are relevant to Rivet:

  1. Single Process Initialization

    Apaches starts up as a single process. During this stage Apache performs various preliminary tasks including reading and parsing the configuration. After the configuration has been read Rivet sets up some internal resources and if a Tcl script is set as argument of a ServerInitScript directive the script is executed. Variables, arrays or dictionaries created during the execution of this script will be preserved and later replicated in the child process intepreters, since the prefork MPM creates new child processes with a fork() system call (which involves only in memory copy of sections of a process address space). Thus ServerInitScript is a good place to do global initialization that doesn't involve creation of private data. Example of tasks that can be done in this context are importing namespace commands and loading packages providing code of general interest for every application to be served. Also IPC methods can be initialized in this stage.
  2. Child Process Initialization

    Right after the webserver has forked its child processes there is a chance to perform specific initialization of their Tcl interpreters. This is the stage where most likely you want to open I/O channels, database connections or any other resource that has to be private to an interpreter. When the option SeparateVirtualInterps is turned off child processes will have a single interpreter regardless the number of virtual hosts configured. The GlobalInitScript is the configuration script the child process will run once before getting ready to serve requests
    When SeparateVirtualInterps is turned on each configured virtual host will have its own slave interpreter which can will run the ChildInitScript directive as initialization script. The ChildInitScript has to be placed within a <VirtualHost...>...</VirtualHost ...> stanza to associate a script to a specific virtual host initialization. This scenario of interpreter separation is extremely useful to prevent resource conflicts when different virtual hosts are serving different web applications.
    GlobalInitScript has no effect to working interpreters when SeparateVirtualInterps is set.
  3. Request Processing and Content Generation

    After a child has been initialized it's ready to serve requests. A child process' lifetime is almost entirely spent in this phase, waiting for connections and responding to requests. At every request the URL goes through filter processing and, in case, rewritten (mod_rewrite, Alias directives, etc). Parameter values encoded in the request are made available to the environment and finally the script encoded in the URL is run. The developer can tell Rivet if optionally the execution has to be preceded by a BeforeScript and followed by an AfterScript. The real script mod_rivet will execute is the result of the concatenation of the BeforeScript, the script encoded in the URL and the AfterScript. Thus the whole ensemble of code that makes up a web application might be running within the same "before" and "after" scripts to which the programmer can devolve tasks common to every page of an application.
  4. Child Process Exit

    If no error condition forces the child process to a premature exit, his life is determined by the Apache configuration parameters. To reduce the effects of memory leaks in buggy applications the Apache webserver forces a child process to exit after a certain number of requests served. A child process gets replaced with a brand new one if the workload of webserver requires so. Before the process quits an exit handler can be run to do some housekeeping, just in case something the could have been left behind has to be cleaned up. Like the initialization scripts ChildExitScript too is a "one shot" script.
    The Tcl exit command forces an interpreter to quit, thus removing the ability of the process embedding it to run more Tcl scripts. The child process then is forced to exit and be replaced by a new one when the workload demands it. This operation implies the ChildExitScript be run before the interpreter is actually deleted.

Apache Rivet Error and Exception Scripts Directives

Rivet is highly configurable and each of the webserver lifecycle stages can be exploited to control a web application. Not only the orderly sequence of stages in a child lifecycle can be controlled with Tcl scripts, but also Tcl error or abnormal conditions taking place during the execution can be caught and handled with specific scripts.

Tcl errors (conditions generated when a command exits with code TCL_ERROR) usually result in the printing of a backtrace of the code fragment relevant to the error. Rivet can set up scripts to trap these errors and run instead an ErrorScript to handle it and conceal details that usually have no interest for the end user and it may show lines of code that ought to remain private. The ErrorScript handler might create a polite error page where things can be explained in human readable form, thus enabling the end user to provide meaningful feedback information.

In other cases an unmanageable conditions might take place in the data and this could demand an immediate interruption of the content generation. These abort conditions can be fired by the abort_page command, which in turn fires the execution of an AbortScript to handle the abnormal condition. Starting with Rivet 2.1.0 abort_page accepts a free form parameter that can be retrieved later with the command abort_code

Tcl Namespaces in Rivet and the ::request Namespace

With the sole exception of .rvt templates, Rivet runs pure Tcl scripts at the global namespace. That means that every variable or procedure created in Tcl scripts resides by default in the "::" namespace (just like in traditional Tcl scripting) and they are persistent across different requests until explicitly unset or until the interpreter is deleted. You can create your own application namespaces to store data but it is important to remember that subsequent requests will in general be served by different child processes. Your application can rely on the fact that certain application data will be in the interpreter, but you shouldn't assume the state of a transaction spanning several pages can be stored in this way and be safely kept available to a specific client. Sessions exist for this purpose and Rivet ships its own session package with support for most of popular DBMS. Nonetheless storing data in the global namespace can be useful, even though scoping data in a namespace is recommended. I/O channels and database connections are examples of information usually specific to a process for which you don't want to pay the overhead of creating them at every request, probably causing a dramatic loss in the application performance.

A special role in the interpreter is played by the ::request namespace. The ::request namespace is deleted and recreated at every request and Rivet templates (.rvt files) are executed within it.

Unless you're fully qualifying variable names outside the ::request namespace, every variable and procedure created in .rvt files is by default placed in it and deleted before any other requests gets processed. It is therefore safe to create variables or object instances in template files and foresake about them: Rivet will take care of cleaning the namespace up and everything created inside the namespace will be destroyed.

Apache InitializationServerInitScript::
Child InitializationGlobalInitScript::
Request ProcessingBeforeScript::
Child TerminationChildExitScript::
Error HandlingErrorScript::