Rivet Internals

This section easily falls out of date, as new code is added, old code is removed, and changes are made. The best place to look is the source code itself. If you are interested in the changes themselves, the Subversion revision control system (svn) can provide you with information about what has been happening with the code.

Rivet approach to Apache Multiprocessing Models

The Apache HTTP web server has an extremely modular architecture that made it very popular among web developers. Most of the server features can be implemented in external modules, including some of the way the server interfaces to the operative system. The multiprocessing modules are meant to provide different models for distributing the server workload but also to cope with different operative systems having their specific architectures and services.

From the very beginning mod_rivet was designed to work with the prefork MPM MPM (Multi Processing Module) which assumes the OS to have 'fork' capabilities. This prerequisite basically restricted mod_rivet to work only with Unix-like operative systems. Starting with version 3.0 we reorganized mod_rivet to offer a design that could work together with more MPM and hopefully pave the way to support different OS that have no 'fork' call. At the same time we tried to preserve some of the basic features of mod_rivet when working with the prefork MPM, chiefly the feature of the Unix fork system call of 'cloning' a parent process memory into its child, thus allowing fast initialization of interpreters.

The central design of mod_rivet now relies on the idea of MPM bridges, loadable modules that are responsible to adapt the module procedural design to a given class of Apache MPMs. This design is open to the development of more MPM bridges coping with different multi-processing models but also to the development of different approaches to resource consumption and workload balance. Currently we have 3 bridges:

  • rivet_prefork_mpm.c: a bridge for the prefork MPM
  • rivet_worker_mpm.c: a threaded bridge creating a pool of threads each running Tcl interpreters and communicating with the worker MPM threads through a thread safe queue. This bridge is needed by the worker MPM.
  • rivet_lazy_mpm.c: a threaded bridge where Tcl threads are started on demand. The bridge creates no threads and Tcl interpreters at start up and only when requests come in Tcl execution threads are created. This bridge is explained in detail in the the section called “Example: the Lazy bridge”. Since the resource demand at startup is minimal this bridge should suit development machines that go through frequent web server restarts.

mod_rivet MPM Bridge callbacks

A bridge is a loadable library implementing different ways to handle specific features needed to mod_rivet. It was originally meant as a way to handle the prefork/worker/event MPM specificities that prevented mod_rivet from supporting each of them, at the same time avoiding the need to stuff the code with conditional statements that would have implied useless complexity (an instance of the Apache web server can run only an MPM at a time), error prone programming and performance costs. New bridges could be imagined also to implement different models of workload and resource management (like the resources demanded by the Tcl interpreters). We designed an interface between the core of mod_rivet and its MPM bridges based on a set of functions defined in the rivet_bridge_table structure.

typedef struct _mpm_bridge_table {
    RivetBridge_ServerInit    *mpm_server_init;
    RivetBridge_ChildInit     *mpm_child_init;
    RivetBridge_Request       *mpm_request;
    RivetBridge_Finalize      *mpm_finalize;
    RivetBridge_Exit_Handler  *mpm_exit_handler;
    RivetBridge_Thread_Interp *mpm_thread_interp;
} rivet_bridge_table;

  • mpm_server_init: pointer to any specific server inititalization function. This field can be NULL if no bridge specific initialization has to be done. The core of mod_rivet runs the ServerInitScript before calling this function.
  • mpm_child_init: Bridge specific child process initialization. If the pointer is assigned with a non-NULL value the function is called by Rivet_ChildInit.
  • mpm_request: This pointer must be a valid function pointer to the content generator implemented by the bridge. If the pointer is not defined the Apache web server will stop at start up. This condition is motivated by the need of avoiding useless testing of the pointer. The fundamental purpose of a content generator module (like mod_rivet) is to respond to requests creating content, thus whatever it is a content generating function must exist (during the early stages of development you can create a simple test function for that). In a threaded MPM this function typically prepares the request processing stuffing somewhere the pointer to the request_rec structure passed by the web server and then it calls some method to communicate these data to the Tcl execution thread waiting for result to be returned. The prefork bridge is an exception since there are no threads and the bridge calls directly Rivet_SendContent
  • mpm_finalize: pointer to a finalization function called during a child process exit. This function is registered as child process memory pool cleanup function. If the pointer is NULL the pool is given a default cleanup function (apr_pool_cleanup_null) defined in src/mod_rivet/mod_rivet.c. For instance the finalize function in the worker MPM bridge notifies a supervisor thread demanding the whole pool of threads running Tcl interpreters to orderly exit. This pointer can be NULL if the bridge has no special need when a child process must exit (unlikely if you have multiple threads running)
  • mpm_exit_handler: mod_rivet replaces the core exit command with a new one (::rivet::exit). This command must handle the process exit in the best possible way for the bridge and the threading model it implements (for the 2 current threaded bridges this implies signaling the threads to exit). The ::rivet::exit actually doesn't terminate the process, but interrupts execution returning a specific error code commands ::rivet::catch and ::rivet::try can detect. Before the process is terminated the AbortScript script is fired and ::rivet::abort_code returns a message describing the exit condition. For instance the worker MPM bridge the finalize function is called after the current thread itself is set up for termination. See function Rivet_ExitCmd in rivetCore.c to have details on how and at what stage this callback is invoked.
  • mpm_thread_interp must be a function returning the interpreter object (a pointer to record of type rivet_thread_interp) associated to a given configuration as stored in a rivet_server_conf* object. This element was temporarily introduced in the mpm_bridge_table table and should be accessed through the macro RIVET_PEEK_INTERP.
    interp_obj = RIVET_PEEK_INTERP(private,private->conf);
    Every bridge implementation should have its own way to store interpreter data and manage their status. So this macro (and associated function) should hide from the module core function the specific approach followed in a particular bridge

Server Initialization and MPM Bridge

RivetChan

The RivetChan system was created in order to have an actual Tcl channel that we could redirect standard output to. This enables us use, for instance, the regular puts command in .rvt pages. It works by creating a channel that buffers output, and, at predetermined times, passes it on to Apache's I/O system. Tcl's regular standard output is replaced with an instance of this channel type, so that, by default, output will go to the web page.

The global Command

Rivet aims to run standard Tcl code with as few surprises as possible. At times this involves some compromises - in this case regarding the global command. The problem is that the command will create truly global variables. If the user is just cut'n'pasting some Tcl code into Rivet, they most likely just want to be able to share the variable in question with other procs, and don't really care if the variable is actually persistant between pages. The solution we have created is to create a proc ::request::global that takes the place of the global command in Rivet templates. If you really need a true global variable, use either ::global or add the :: namespace qualifier to variables you wish to make global.

Page Parsing, Execution and Caching

When a Rivet page is requested, it is transformed into an ordinary Tcl script by parsing the file for the <? ?> processing instruction tags. Everything outside these tags becomes a large puts statement, and everything inside them remains Tcl code.

Each .rvt file is evaluated in its own ::request namespace, so that it is not necessary to create and tear down interpreters after each page. By running in its own namespace, though, each page will not run afoul of local variables created by other scripts, because they will be deleted automatically when the namespace goes away after Apache finishes handling the request.

[Note]Note
One current problem with this system is that while variables are garbage collected, file handles are not, so that it is very important that Rivet script authors make sure to close all the files they open.

After a script has been loaded and parsed into it's "pure Tcl" form, it is also cached, so that it may be used in the future without having to reload it (and re-parse it) from the disk. The number of scripts stored in memory is configurable. This feature can significantly improve performance.

Extending Rivet by developing C code procedures

Rivet endows the Tcl interpreter with new commands serving as interface between the application layer and the Apache web server. Many of these commands are meaningful only when a HTTP request is under way and therefore a request_rec object allocated by the framework is existing and was passed to mod_rivet as argument of a callback. In case commands have to gain access to a valid request_rec object the C procedure must check if such a pointer exists and it's initialized with valid data. For this purpose the procedure handling requests (Rivet_SendContent) makes a copy of such pointer and keeps it in an internal structure. The copy is set to NULL just before returning to the framework, right after mod_rivet's has carried out its request processing. When the pointer copy is NULL the module is outside any request processing and this condition invalidates the execution of many of the Rivet commands. In case they are called (for example in a ChildInitScript, GlobalInitScript, ServerInitScript or ChildExitScript) they fail with a Tcl error you can handle with a catch command.

For this purpose in src/rivet.h the macro CHECK_REQUEST_REC was defined accepting two arguments: the thread private data object and the command name. If the pointer is NULL the macro calls Tcl_NoRequestRec and returns TCL_ERROR causing the command to fail. These are the steps to follow in order to write a new C language command for mod_rivet

  • Define the command and associated C language procedure in src/mod_rivet_ng/rivetCore.c using the macro RIVET_OBJ_CMD
    RIVET_OBJ_CMD("mycmd",Rivet_MyCmd,private)
    This macro ensures the command is defined as ::rivet::mycmd and its ClientData pointer is defined with the thread private data
  • Add the code of Rivet_MyCmd to src/mod_rivet_ng/rivetCore.c (in case the code resides in a different file also src/Makefile.am should be changed to tell the build system how to compile the code and link it into mod_rivet.so)
  • If the code must have access to the request record in private->r use the macro THREAD_PRIVATE_DATA in order to claim the thread private data, then check for the validity of the pointer using the macro CHECK_REQUEST_REC(private,"::rivet::<cmd_name>")
    TCL_CMD_HEADER(Rivet_MyCmd)
    {
        /* we have to get the thread private data */
        
        THREAD_PRIVATE_DATA(private)
    
    	/* if ::rivet::mycmd works within a request processing we have
    	 * to check if 'private' is bringing a non null request_rec pointer
    	 */
        
        CHECK_REQUEST_REC(private,"::rivet::mycmd");
        ....
        
        return TCL_OK;
    }
  • Add a test for this command in tests/checkfails.tcl. For instance
    ...
    check_fail no_body
    check_fail virtual_filename unkn
    check_fail my_cmd <arg1> <arg2>
    ....
    Where <arg1> <arg2> are optional arguments in case the command has different forms depending on the arguments. Then, if ::rivet::mycmd must fail also tests/failtest.tcl should modified as
    virtual_filename->1
    mycmd->1
    The value associated to the test must be 0 in case the command doesn't need to test the private->r pointer.

Debugging Rivet and Apache

If you are interested in hacking on Rivet, you're welcome to contribute! Invariably, when working with code, things go wrong, and it's necessary to do some debugging. In a server environment like Apache, it can be a bit more difficult to find the right way to do this. Here are some techniques to try.

The first thing you should know is that Apache can be launched as a single process with the -X argument:

httpd -X
.

On Linux, one of the first things to try is the system call tracer, strace. You don't even have to recompile Rivet or Apache for this to work.

strace -o /tmp/outputfile -S 1000 httpd -X

This command will run httpd in the system call tracer, which leaves its output (there is potentially a lot of it) in /tmp/outputfile. The -S option tells strace to only record the first 1000 bytes of a syscall. Some calls such as write can potentially be much longer than this, so you may want to increase this number. The results are a list of all the system calls made by the program. You want to look at the end, where the failure presumably occured, to see if you can find anything that looks like an error. If you're not sure what to make of the results, you can always ask on the Rivet development mailing list.

If strace (or its equivalent on your operating system) doesn't answer your question, it may be time to debug Apache and Rivet. To do this, you will need to rebuild mod_rivet. First of all you have to configure the build by running the ./configure script with the -enable-symbols option and after you have set the CFLAGS and LDFLAGS environment variables

export CFLAGS="-g -O0"
export LDFLAGS="-g"
./configure --enable-symbols ......
make
make install

Arguments to ./configure must fit your Apache HTTP web server installation. See the output produced by

./configure --help

And check the the section called “Apache Rivet 3.1 Installation” page to have further information. Since it's easier to debug a single process, we'll still run Apache in single process mode with -X:

@ashland [~] $ gdb /usr/sbin/apache.dbg
GNU gdb 5.3-debian
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc-linux"...
(gdb) run -X
Starting program: /usr/sbin/apache.dbg -X
[New Thread 16384 (LWP 13598)]
.
.
.

When your apache session is up and running, you can request a web page with the browser, and see where things go wrong (if you are dealing with a crash, for instance).