1. Introduction

This document tries to document the design of tomcat-3 and the reasons behind this. Tomcat is the result of a long evolution, with many people with different perspectives contributing code and ideas. Most of the code in tomcat was rewritten 2 or 3 times in the last year, with many experiments and prototypes developed off-line. It is important to distinguish between the actual patterns and ideas and the implementation - the later will change and did changed a lot.


2. Goals


3. Design Patterns

Chain of responsibility

"Avoids coupling the sender of a request to its receiver by giving more than one object a chacne to handle it". In our case the servlet API implementation and the request processing ( mapping, auth, etc) are handler by the Interceptors. The operation is chained and passed along until one interceptor handles it.

Interceptors are the mechanism used to implement and extend tomcat. It avoids coupling the sender of the request ( tomcat core, servlets ) with the actual receiver ( one of the modules handling the requested operation ). The module that actually provide the implementation is not known.

There are 2 important reasons for choosing this pattern for tomcat.

The exact same pattern is used in IIS in the form of filters, NES as SAFs and Apache as modules and hooks. This is the main tool of extending those servers. It also allows the integration between tomcat and web servers ( since the same pattern is used tomcat, will be able to reuse existing code and provide server extensions ). The fact that this works well in existing high performance and the extensibility it provides was a very strong argument.

We also want to integrate tomcat with applications and external packages that may provide various services. This will be a deployment decision, and the core and basic modules shouldn't depend on any particular handler implementation. An example is authentication - the service can be provided by the default tomcat interceptor that use a static xml file, by the native web server, by JAAS, or a private API if tomcat is embedded in applications.

The implementation consists in Interceptors and Events ( possibly in 3.3 or 3.4 ). See also Strategy.

Strategy

"Define a family of algorithms, encapsulate each one and make them interchangeable".

This allows an easy evolution of tomcat and simplifies the development in an open source environment. We were able to isolate the original implementation of tomcat 3.0 and re-implement individual components while maintaining the overall stability. We started with a code that was very hard to read because of many incremental changes and fixes, but was extremely stable and very well tested. By using this pattern we moved the ugly algorithm in a module and then provided better implementations.

Separating the actual algorithm from the code that calls it allowed us to focus on what was important for the problem, and remove the dependencies.

It allows us to provide alternative implementations for each service ( we have 2-3 authentication modules, 3-4 mapping interceptors were developed and expect more improvements) and decide on a particular implementation based on stability and performance, after the code is ready.

We can choose a set of components that are best tuned for our application - for example we can use a mapping interceptor ( that parses and matches the request against a set of mapping rules ) that is optimized for memory usage or one tuned for speed. An ISP will choose a specialized algorithms that is able to deal with large number of web applications - for example by loading the mappings for a certain application only when needed ( and maintaining a cache of mappings for frequent access)

One particular application is the use of existing algorithms implemented in native servers. Most web servers have highly tuned algorithms ( not a surprise ), and this seems great for lazy programmers - we can get very fast code without too much work.

This is implemented in RequestInterceptors - see also Chain Of Responsibility.

Observer

This pattern defines a one-to-many dependency between core objects ( Context, ContextManager ) and modules ( Interceptors). When the core object change it's state all the dependents will be notified and updated automatically.

Tomcat state changes at runtime - and all the modules interested in a particular state can be notified and change their internal ( possibly optimized ) data structures. That means the core can be kept simple, without advanced data structures, while modules can use Trees ( JDK1.2 collections) or other complex representations.

It also enable us to keep the core compatible with JDK1.1, while taking advantage of all "modern" features ( this works together with the Strategy pattern in separating the algorithm/data from the caller)

It also allows a simpler core, without any optimization that may affect the code readability. The complex structures are in modules - the Observer will keep the information in sync.

The implementation is done in ContextInterceptor ( that will probably be replaced by a more standard Event/Listener model in 3.3 or 3.4). ContextManager, Context, Container will send notifications each time their state changes, and interceptors will alter their internal state.

Adapter

It converts the interfaces of a class into the interface tomcat.core expects. This is the main enabler for "integration with other applications" requirement.

Adapter is very important for allowing tomcat to reuse external APIs and modules, without requiring them to change and without changing tomcat ( and making it dependent of a particular API/application). It allows us to use multiple APIs for various tasks - for example authentication can be done using JAAS, J2EE APIs, other proprietary APIs or using the web server. Instead of hard-coding a Realm interface and attempting to fit everything under it ( JAAS is a good candidate for such a "generic auth API ", but it's too much for simple cases ) we just build adapters for whatever API we have to integrate with.

This pattern is also used in integration with the web server, with Request, Response, Container acting as adapters for the equivalent objects in web servers.

This is used in conjunction with "Chain of responsibility" and "Strategy" to enable "lazy programming" ( or reuse of existing APIs :-).

A number of Interceptors are using Adapter for their implementation. J2EEInterceptor is a very good example of adapter - on one side it knows tomcat interface ( Interceptor), and the other side knows J2EE interfaces.

Proxy

"It provides a "surrogate" or placeholder for another object to control access to it". This is the main tool in isolating the tomcat internals from servlets and user applications. While some form of class loader hacks can also be used to enhance that, this pattern is a very clean mechanism and combined with the Facade provides many other advantages.

It also has an important role in performance, by allowing us to delay expensive operations until they are needed ( String creation ) and use the more efficient interface in core ( see also Facade).

The Servlet API was designed for easy to use and security for application developers, while the constraints and requirements in tomcat.core are very different.

This pattern is used in JDK1.3 to create a type-safe implementation of interfaces ( no down-casting allowed ).

Why? Security was the original reason, but this pattern is at the core of many performance optimizations and will allow a large number of features.

Facade

Converts the interface of a class into another. In particular, this is used to present web applications with the (right) servlet API, while using more efficient interfaces for the core.

The "facade" name was used ( wrongly ) in many security discussions and in all interfaces, in fact "Proxy" is responsible for security while facade is important for performance and to implement the servlet API on top of a set of internal APIs. For example HttpServletRequest combines access to a number of core interfaces.

Why? The servlet API has certain goals and requirement that are oriented toward the servlet developer. The web server and tomcat.core have different requirements, and by using the right API we can provide a number of optimizations. For example it is possible to avoid String over-use, to use more specialized components, and to use adapters for the (native) web servers.

Another very important benefit of Facade consist in insuring an independence between the current servlet API ( that change with a certain speed ) and the internal representation, and also allows one tomcat instance to support multiple "profiles" of servlet API. For example the transition between servlet 2.2 and 2.3 can be smooth, by allowing 2.2 webapps to run in the same container. This way the users will not have to rewrite ( or recompile ) all their applications every time the spec changes. This can allow tomcat 3.3 to support both servlet 2.2 and servlet 2.3 APIs.


4. Main Components ( tomcat.core )

Request/Response

This is the tomcat representation of a request. It acts as an "Adapter" for the underlying application's representation - in the case of a web server it is the java representation of (request_rec *) for example, if tomcat is embedded in a non-web application it represent the application's notion of request.

This is a passive object, with all operations delegated to modules. It is important to make as much as possible "lazy" evaluations, and keeps the code very simple.

The objects are not a replication of HttpServletRequest and HttpServletResponse. The javax interfaces are designed for web application developers, while core have different needs. It would be very difficult to implement HttpServletRequest as an adapter for request_rec, and it would have a big impact on performance to be restricted to the String-based representation ( that is required for applications - security reasons, etc).

For example, in tomcat 3.2 we started to expose the internal buffers and we'll make more use of this - instead of using the Stream/Writer interfaces, core components will have direct control over the buffering and char/byte translation.

XXX It may be a good idea for the core to use concepts similar with web servers - i.e. only Request, and add Connection to represent the http connection data. The only reason to have Request/Response is to match the servlet model, but the role of core is to match the model used by the web server. A picture would be great.

OutputBuffer

The idea is to have full control over tomcat buffering and over char/byte translation, as this is one of the most expensive operations ( according to empirical tests ). It should allow to plug external byte/char convertors. As we know, multiple copy from buffer to buffer proved to be an important factor in many systems. It doesn't show up in tomcat 3.2 because other factors are too big, but as we approach the limits it will be one of the most important issue.

An important feature will be the integration of jasper and other template systems, where "static" text can be delivered to the server directly, without intermediary buffers, and the binary representation ( encoding) can be pre-computed.

MessageBytes

This is a very special and important component of tomcat3. The first problem it tried to solve was the conversion from byte to chars. The charset is known very late ( either when Content-Type header is parsed or even later, when the user sets it in servlet 2.3), and converting the received bytes to Strings can't be done (corectly ) until the encoding is determined.

MessageBytes have another benefit - it's a great mechanism to avoid Garbage Collection costs. Most of the time the user will not access all the headers, and probably neither most of the request information ( url components, etc). That means a "HelloWorld" servlet ( or any servlet that just display some info) will require no String allocations when MessageBytes are used - as oposed to 30-40 Strings allocated ( and GC ) before.

Another ( probably less visible ) benefit is the parsing of the request. Parsing using String proved to lead to inefficient code - Java provides general purpose utilities like StringTokenizer or substring() / indexOf(), but they are designed as "general purpose", with different goals in mind. The parsing code in tomcat 3.0 used those mechanisms and that resulted in 100+ String allocations per request.

Note that String is a very important class in Java - it have the special characteristic of beeing "imutable". That has big implications from a security point of view, and most Servlet API methods are using String as result type/parameter. On the other side String is extremely expensive for server-side operation - it can't be recycled. See also the discussion on GarbageCollection and memory.

Context

Context is the representation of a web application. It have all the properties that are definable in web.xml and server.xml.

The current implementation is complex and has much more than that - this have to change.

The context is associated with a Request after the contextMap() callback completes. By default the mapping interceptor ( SimpleMapper in the default case ) will implement this hook, but other interceptors can provide extra functionality ( support for ~user, dynamic addition of contexts when the first request is made, etc )

Container

It represent a group of URLs sharing a common set of properties. It's analog with per_dir configuration in apache. The "standard" properties in servlet spec are the handler ( specified in servlet mappings) and the security properties.

The Container is associated with a Request after the requestMap() hook completes. SimpleMapper is the "core"
implementation of this hook, providing support for prefix, exact and extension mappings. Other interceptors can provide optimized mappings for particular subsets ( like JspIntercepor ) or implement custom mapping schemes.

This is a very important operation ( it should be the most expensive part of the processing - unfortunately we still have to eliminate few other hotspots before this will happen ). We exepect alternate implementations using more advanced data structures and alghoritms ( Trees, etc ). See also the discussion on "evolution".

In tomcat 3.3 it is possible to specify per context and container interceptors - after the mapping, tomcat will know what is the container where the request belongs and can invoke specialized interceptors defined only for that particular set of URLs. This is similar with what all other web servers provide ( the order is similar with NES).

ContextManager

Main entry point for tomcat execution. It coordinates the activity of modules.

In tomcat, the application embedding tomcat ( web server, generic application, tomcat standalone server) have control over execution. Embedding tomcat requires an adapter that will construct the Request/Response adapters and use ContextManager to process the request.



XXX Probably a better name would be Server, or TomcatContainer.

Interceptor

The interceptors represents the building blocks and extension mechansim for tomcat. Most of tomcat functionality is implemented using modules. Modules operate on tomcat's core objects and can hook in and extend tomcat. They are designed around "Chain of Responsibility" and "Strategy" patterns, and inspired from Apache hooks, with influences from ISAPI ( filters ) and NSAPI ( saf) . As you know most of Apache ( including the core ) is implemented using a simple hook mechanism.

One interesting sub-project is to make sure interceptors are interoperable with major web server mechansims - and play the same role as mod_perl for example: allow easy server extension using Java ( i.e. provide access to all the full power of native server modules).

You can control all aspects of request processing - parsing, authentication, authorization, sessions, response commit ( before headers are sent), buffer commit ( before any buffer is sent - it can be used to support HTTP1.1 for example !), before and after body.

Interceptors are also used to provide integration with web servers ( in tomcat <=3.2 a special "Connector" interface was used, but we found that Connector was in fact a subset of Interceptor and web integration requires more then just start/stop, so using the same mechanism is a better idea).

Currently tomcat defines a number of hooks - other will be added as needed:

Helpers ( tomcat.helpers )

In order to keep the core as simple and readable as possible, the few functions that are implemented inside core ( instead of using interceptors ) are implemented in specialized helpers. For example reading the form data, encoding, etc. The helpers are grouped by functionality and the intention is to allow easy evolution - it is possible to rewrite or replace them with minimal effect on the rest of tomcat. This is one way to make sure team development works and new contributors can improve tomcat without the fear of broking the core.


5. Execution paths ( ContextManager and core )

Initialization

XXX server intialization and configuration, context add/init/shutdown/remove, etc

Request processing

Each web request is received by the web adapter, which will provide the Request/Response adapters. It will then call ContextManager.service(), and tomcat core will begin processing the request. A similar path happens for request that are not generated by a HTTP transaction - like sub-requests or non-web requests ( when tomcat is embedded in applications != web servers )


During request processing, tomcat will send notifications ("Observer") to interested modules about the current stage of the processing, for example allowing the modules to set transactions contexts ( J2EE interceptor ) or instrument tomcat ( timing individual operations for fine tuning ).

This model is identical with the one used in Apache, IIS and NES ( we tried to represent all notifications, so it tries to be a union, not intersection). Why? Performance was the original reason, and the fact that the model was well-known and polished, so more trust-worthy then alternative models. It also turned out that the code organization improved a lot by using the "Strategy" pattern ( compared with tomcat 3.0 ), and many optimizations became possible.

Another reason is related with the idea of making tomcat a real web-server extension, allowing people to develop web server modules ( filter, SAF, hooks) in java using a clean API. There are many ways to implement a web server, but this design choose Apache, IIS and NES as model, and tries to allow bi-directional code reuse.


6. Interceptors ( tomcat.context, tomcat.request)

Standard modules

Adding custom modules

Revolution by Evolution

By using the "Strategy" pattern it is possible to provide alternative alghoritms and implementations without affecting the core modules. In time the new "strategies" can be tested ( by first using them on experimental sytes, then in production ) and eventually replace the "core" modules.

Tomcat is designed such that the core will have a minimal role in how the request is processed. We try to keep the number of methods and structures as small as possible, while "evolving" the core based on the use cases and needs of various modules. We expect that 3.3 or 3.4 will have a stable core.

This method allowed a smooth evolution from tomcat3.0 who had a very complex and ugly (!) code, due to numerous additions and fixes. It is a standard form of refactoring that proved very useful.




7. Reusable Components ( tomcat.util )

Tomcat tries as much as possible to keep most of the code in standalone components, that have no dependency on tomcat.core. For example, the class loader, the session manager, the TCP server, authentication - all of them can easily be used in different architectures, and don't (shouldn't ) have any reference to core interfaces.

There are many reasons for this: we think tomcat should focus on implementing the servlet API and shouldn't reinvent the wheel, and we also think all those components should be generic enough to be used in other servers.

For example, most servlet container allows you to plug in a session manager. Session manager have ( shouldn't have ) too much need for knowing about the container internals - it is a special form of object database. Most of the time is better to write code independent of a particular container and using adapters to hook into different containers.

ThreadPool




8. Security

XXX Move to separate document!

The Proxy and FacadeManager plays a very important role in insuring security, by providing control of the communication between webapps and tomcat internals. It is possible to augment that with a special class loader mechanism ( for paranoid administrators), but most of the information that is available suggest the proxy is a very effective design pattern for this.

Since tomcat.core runs with special (java) privileges, it is important to make sure the code is safe and the user can't call arbitrary methods. That will require a serious review over what methods are public, and how internal object instances can be accessed from outside. FacadeManager should be the only point of entry.

We do need to provide a certain internal API to applications that will allow enhanced performance - for example Jasper can make use of the internal buffers, etc.

Existing risks and problems

Isolation. It is possbile for an application to store the reference to the HttpServletRequests it receives in a Hashtable. If the server is using recycling, the "bad" application will end up with references to all active requests, and will be able to access parameters and informations of other web applications.
In tomcat the HttpServletRequest is a proxy and a facade for the real request. Tomcat will recycle the Request object, and can ( based on a config option) create new HttpServletRequestFacades for each request.
Constructor access. It is possible that an application will use the current internal APIs to gain unauthorized access and call internal APIs
The only way to get access to internal APIs is using a special context/request attribute. We allow only "trusted" applications to call the method, and return null if the trusted flag is not set. We need to carefully review the code to be sure no other way to access the internals is possible, and reduce the number of public constructors and public methods. We also need to add a ( JDK1.1 compatible ) mechanism to do security checks when objects are constructed. ( for example add a TomcatSecurity object with empty methods for 1.1 and using TomcatPermission and the 1.2 security system for 1.2.
DOS - large body with POST data
Tomcat will read the full body when a POST is processed. We need to set upper limits or at least warn of possible abuse.
DOS - general
There are many possible DOS attacks, we need to identify them and provide mechanisms to reduce the effect.

9. Embedding tomcat

Use the tomcat APIs to create ContextManager and set it up or stop it. Integration with your configuration API is very simple - tomcat doesn't care how you set it up, as long as the right object are created and the setter methods are called.

All integration is done using the "Adapter" pattern - the Interceptors provide you with hooks into all tomcat internals, and you can control every aspect of request processing.


10. Integration with Web servers

Most commercial web sites run on one of the 4 big web servers, and we want to make sure tomcat can work well with those. After all, most people are using servlets for web applications :-)

There are huge problems, even with the current architecture. For example the mapping defined in servlet 2.2 is different from the mapping used in the current web servers, form based login is very difficult to integrate ( and will require a lot of native code to protect non-java resources using the same authentication ).

Most of the problems still require a lot of work, but we hope that the current architecture will be able to handle that. A bigger goal will be to transform the Interceptors into real web server extensions, so people can develop hooks/modules/SAF/filters in java instead of C ( or mod_perl ).

Tomcat and other Web Servers

Tomcat design is based on the features of Apache modules/hooks, ISAPI and NSAPI. We tried to not limit ourself to what is common to all. In time it is likely that tomcat will have more hooks and provide the same level of extensibility as the other servers. We tried to add the hooks by following "when somebody needs it " rule - and so far a number of callbacks are still to be added.

Please read the Apache, IIS and NES documentations & code, as. Except for using different names and Java-style for interfaces, the model is the same.

The mapping is:

Tomcat

Apache

ISAPI

NSAPI

Other

Request / Response

request_rec
conn_rec




ServletWrapper

Handler




Context

server_rec /
per_dir_config




Container

per_dir_config




Interceptor

Module / hooks




ContextManager








Missing: