Package org.apache.tomcat.util.digester
Introduction
In many application environments that deal with XML-formatted data, it is useful to be able to process an XML document in an "event driven" manner, where particular Java objects are created (or methods of existing objects are invoked) when particular patterns of nested XML elements have been recognized. Developers familiar with the Simple API for XML Parsing (SAX) approach to processing XML documents will recognize that the Digester provides a higher level, more developer-friendly interface to SAX events, because most of the details of navigating the XML element hierarchy are hidden -- allowing the developer to focus on the processing to be performed.
In order to use a Digester, the following basic steps are required:
- Create a new instance of the
org.apache.commons.digester.Digester
class. Previously created Digester instances may be safely reused, as long as you have completed any previously requested parse, and you do not try to utilize a particular Digester instance from more than one thread at a time. - Set any desired configuration properties that will customize the operation of the Digester when you next initiate a parse operation.
- Optionally, push any desired initial object(s) onto the Digester's object stack.
- Register all of the element matching patterns for which you wish to have processing rules fired when this pattern is recognized in an input document. You may register as many rules as you like for any particular pattern. If there is more than one rule for a given pattern, the rules will be executed in the order that they were listed.
- Call the
digester.parse()
method, passing a reference to the XML document to be parsed in one of a variety of forms. See the Digester.parse() documentation for details. Note that you will need to be prepared to catch anyIOException
orSAXException
that is thrown by the parser, or any runtime expression that is thrown by one of the processing rules.
For example code, see the usage examples, and the FAQ .
Digester Configuration Properties
A org.apache.commons.digester.Digester
instance contains several
configuration properties that can be used to customize its operation. These
properties must be configured before you call one of the
parse()
variants, in order for them to take effect on that
parse.
Digester Configuration Properties Property Description classLoader You can optionally specify the class loader that will be used to load classes when required by the ObjectCreateRule
andFactoryCreateRule
rules. If not specified, application classes will be loaded from the thread's context class loader (if theuseContextClassLoader
property is set totrue
) or the same class loader that was used to load theDigester
class itself.errorHandler You can optionally specify a SAX ErrorHandler
that is notified when parsing errors occur. By default, any parsing errors that are encountered are logged, but Digester will continue processing as well.namespaceAware A boolean that is set to true
to perform parsing in a manner that is aware of XML namespaces. Among other things, this setting affects how elements are matched to processing rules. See Namespace Aware Parsing for more information.ruleNamespaceURI The public URI of the namespace for which all subsequently added rules are associated, or null
for adding rules that are not associated with any namespace. See Namespace Aware Parsing for more information.rules The Rules
component that actually performs matching ofRule
instances against the current element nesting pattern is pluggable. By default, Digester includes aRules
implementation that behaves as described in this document. See Pluggable Rules Processing for more information.useContextClassLoader A boolean that is set to true
if you want application classes required byFactoryCreateRule
andObjectCreateRule
to be loaded from the context class loader of the current thread. By default, classes will be loaded from the class loader that loaded thisDigester
class. NOTE - This property is ignored if you set a value for theclassLoader
property; that class loader will be used unconditionally.validating A boolean that is set to true
if you wish to validate the XML document against a Document Type Definition (DTD) that is specified in itsDOCTYPE
declaration. The default value offalse
requests a parse that only detects "well formed" XML documents, rather than "valid" ones.
In addition to the scalar properties defined above, you can also register
a local copy of a Document Type Definition (DTD) that is referenced in a
DOCTYPE
declaration. Such a registration tells the XML parser
that, whenever it encounters a DOCTYPE
declaration with the
specified public identifier, it should utilize the actual DTD content at the
registered system identifier (a URL), rather than the one in the
DOCTYPE
declaration.
For example, the Struts framework controller servlet uses the following registration in order to tell Struts to use a local copy of the DTD for the Struts configuration file. This allows usage of Struts in environments that are not connected to the Internet, and speeds up processing even at Internet connected sites (because it avoids the need to go across the network).
URL url = new URL("/org/apache/struts/resources/struts-config_1_0.dtd"); digester.register ("-//Apache Software Foundation//DTD Struts Configuration 1.0//EN", url.toString());
As a side note, the system identifier used in this example is the path
that would be passed to java.lang.ClassLoader.getResource()
or java.lang.ClassLoader.getResourceAsStream()
. The actual DTD
resource is loaded through the same class loader that loads all of the Struts
classes -- typically from the struts.jar
file.
The Object Stack
One very common use of org.apache.commons.digester.Digester
technology is to dynamically construct a tree of Java objects, whose internal
organization, as well as the details of property settings on these objects,
are configured based on the contents of the XML document. In fact, the
primary reason that the Digester package was created (it was originally part
of Struts, and then moved to the Commons project because it was recognized
as being generally useful) was to facilitate the
way that the Struts controller servlet configures itself based on the contents
of your application's struts-config.xml
file.
To facilitate this usage, the Digester exposes a stack that can be manipulated by processing rules that are fired when element matching patterns are satisfied. The usual stack-related operations are made available, including the following:
- clear() - Clear the current contents of the object stack.
- peek() - Return a reference to the top object on the stack, without removing it.
- pop() - Remove the top object from the stack and return it.
- push() - Push a new object onto the top of the stack.
A typical design pattern, then, is to fire a rule that creates a new object and pushes it on the stack when the beginning of a particular XML element is encountered. The object will remain there while the nested content of this element is processed, and it will be popped off when the end of the element is encountered. As we will see, the standard "object create" processing rule supports exactly this functionality in a very convenient way.
Several potential issues with this design pattern are addressed by other features of the Digester functionality:
- How do I relate the objects being created to each other? - The Digester supports standard processing rules that pass the top object on the stack as an argument to a named method on the next-to-top object on the stack (or vice versa). This rule makes it easy to establish parent-child relationships between these objects. One-to-one and one-to-many relationships are both easy to construct.
- How do I retain a reference to the first object that was created?
As you review the description of what the "object create" processing rule
does, it would appear that the first object you create (i.e. the object
created by the outermost XML element you process) will disappear from the
stack by the time that XML parsing is completed, because the end of the
element would have been encountered. However, Digester will maintain a
reference to the very first object ever pushed onto the object stack,
and will return it to you
as the return value from the
parse()
call. Alternatively, you can push a reference to some application object onto the stack before callingparse()
, and arrange that a parent-child relationship be created (by appropriate processing rules) between this manually pushed object and the ones that are dynamically created. In this way, the pushed object will retain a reference to the dynamically created objects (and therefore all of their children), and will be returned to you after the parse finishes as well.
Element Matching Patterns
A primary feature of the org.apache.commons.digester.Digester
parser is that the Digester automatically navigates the element hierarchy of
the XML document you are parsing for you, without requiring any developer
attention to this process. Instead, you focus on deciding what functions you
would like to have performed whenever a certain arrangement of nested elements
is encountered in the XML document being parsed. The mechanism for specifying
such arrangements are called element matching patterns.
A very simple element matching pattern is a simple string like "a". This
pattern is matched whenever an <a>
top-level element is
encountered in the XML document, no matter how many times it occurs. Note that
nested <a>
elements will not match this
pattern -- we will describe means to support this kind of matching later.
The next step up in matching pattern complexity is "a/b". This pattern will
be matched when a <b>
element is found nested inside a
top-level <a>
element. Again, this match can occur as many
times as desired, depending on the content of the XML document being parsed.
You can use multiple slashes to define a hierarchy of any desired depth that
will be matched appropriately.
For example, assume you have registered processing rules that match patterns "a", "a/b", and "a/b/c". For an input XML document with the following contents, the indicated patterns will be matched when the corresponding element is parsed:
<a> -- Matches pattern "a" <b> -- Matches pattern "a/b" <c/> -- Matches pattern "a/b/c" <c/> -- Matches pattern "a/b/c" </b> <b> -- Matches pattern "a/b" <c/> -- Matches pattern "a/b/c" <c/> -- Matches pattern "a/b/c" <c/> -- Matches pattern "a/b/c" </b> </a>
It is also possible to match a particular XML element, no matter how it is
nested (or not nested) in the XML document, by using the "*" wildcard character
in your matching pattern strings. For example, an element matching pattern
of "*/a" will match an <a>
element at any nesting position
within the document.
It is quite possible that, when a particular XML element is being parsed, the pattern for more than one registered processing rule will be matched either because you registered more than one processing rule with the same matching pattern, or because one more more exact pattern matches and wildcard pattern matches are satisfied by the same element.
When this occurs, the corresponding processing rules will all be fired in order.
begin
(and body
) method calls are executed in the
order that the Rules
where initially registered with the
Digester
, whilst end
method calls are execute in
reverse order. In other words - the order is first in, last out.
Processing Rules
The previous section documented how you identify when you wish to have certain actions take place. The purpose of processing rules is to define what should happen when the patterns are matched.
Formally, a processing rule is a Java class that subclasses the org.apache.commons.digester.Rule interface. Each Rule implements one or more of the following event methods that are called at well-defined times when the matching patterns corresponding to this rule trigger it:
- begin() - Called when the beginning of the matched XML element is encountered. A data structure containing all of the attributes corresponding to this element are passed as well.
- body() - Called when nested content (that is not itself XML elements) of the matched element is encountered. Any leading or trailing whitespace will have been removed as part of the parsing process.
- end() - Called when the ending of the matched XML element is encountered. If nested XML elements that matched other processing rules was included in the body of this element, the appropriate processing rules for the matched rules will have already been completed before this method is called.
- finish() - Called when the parse has been completed, to give each rule a chance to clean up any temporary data they might have created and cached.
As you are configuring your digester, you can call the
addRule()
method to register a specific element matching pattern,
along with an instance of a Rule
class that will have its event
handling methods called at the appropriate times, as described above. This
mechanism allows you to create Rule
implementation classes
dynamically, to implement any desired application specific functionality.
In addition, a set of processing rule implementation classes are provided, which deal with many common programming scenarios. These classes include the following:
- ObjectCreateRule - When the
begin()
method is called, this rule instantiates a new instance of a specified Java class, and pushes it on the stack. The class name to be used is defaulted according to a parameter passed to this rule's constructor, but can optionally be overridden by a classname passed via the specified attribute to the XML element being processed. When theend()
method is called, the top object on the stack (presumably, the one we added in thebegin()
method) will be popped, and any reference to it (within the Digester) will be discarded. - FactoryCreateRule - A variation of
ObjectCreateRule
that is useful when the Java class with which you wish to create an object instance does not have a no-arguments constructor, or where you wish to perform other setup processing before the object is handed over to the Digester. - SetPropertiesRule - When the
begin()
method is called, the digester uses the standard Java Reflection API to identify any JavaBeans property setter methods (on the object at the top of the digester's stack) who have property names that match the attributes specified on this XML element, and then call them individually, passing the corresponding attribute values. These natural mappings can be overridden. This allows (for example) aclass
attribute to be mapped correctly. It is recommended that this feature should not be overused - in most cases, it's better to use the standardBeanInfo
mechanism. A very common idiom is to define an object create rule, followed by a set properties rule, with the same element matching pattern. This causes the creation of a new Java object, followed by "configuration" of that object's properties based on the attributes of the same XML element that created this object. - SetNextRule - When the
end()
method is called, the digester analyzes the next-to-top element on the stack, looking for a property setter method for a specified property. It then calls this method, passing the object at the top of the stack as an argument. This rule is commonly used to establish one-to-many relationships between the two objects, with the method name commonly being something like "addChild". - CallMethodRule - This rule sets up a
method call to a named method of the top object on the digester's stack,
which will actually take place when the
end()
method is called. You configure this rule by specifying the name of the method to be called, the number of arguments it takes, and (optionally) the Java class name(s) defining the type(s) of the method's arguments. The actual parameter values, if any, will typically be accumulated from the body content of nested elements within the element that triggered this rule, using the CallParamRule discussed next. - CallParamRule - This rule identifies the source of a particular numbered (zero-relative) parameter for a CallMethodRule within which we are nested. You can specify that the parameter value be taken from a particular named attribute, or from the nested body content of this element.
You can create instances of the standard Rule
classes and
register them by calling digester.addRule()
, as described above.
However, because their usage is so common, shorthand registration methods are
defined for each of the standard rules, directly on the Digester
class. For example, the following code sequence:
Rule rule = new SetNextRule(digester, "addChild", "com.mycompany.mypackage.MyChildClass"); digester.addRule("a/b/c", rule);
can be replaced by:
digester.addSetNext("a/b/c", "addChild", "com.mycompany.mypackage.MyChildClass");
Logging
Logging is a vital tool for debugging Digester rulesets. Digester can log copious amounts of debugging information. So, you need to know how logging works before you start using Digester seriously.
Two main logs are used by Digester:
- SAX-related messages are logged to
org.apache.commons.digester.Digester.sax
. This log gives information about the basic SAX events received by Digester. org.apache.commons.digester.Digester
is used for everything else. You'll probably want to have this log turned up during debugging but turned down during production due to the high message volume.
Usage Examples
Creating a Simple Object Tree
Let's assume that you have two simple JavaBeans, Foo
and
Bar
, with the following method signatures:
package mypackage; public class Foo { public void addBar(Bar bar); public Bar findBar(int id); public Iterator getBars(); public String getName(); public void setName(String name); } public mypackage; public class Bar { public int getId(); public void setId(int id); public String getTitle(); public void setTitle(String title); }
and you wish to use Digester to parse the following XML document:
<foo name="The Parent"> <bar id="123" title="The First Child"/> <bar id="456" title="The Second Child"/> </foo>
A simple approach will be to use the following Digester in the following way to set up the parsing rules, and then process an input file containing this document:
Digester digester = new Digester(); digester.setValidating(false); digester.addObjectCreate("foo", "mypackage.Foo"); digester.addSetProperties("foo"); digester.addObjectCreate("foo/bar", "mypackage.Bar"); digester.addSetProperties("foo/bar"); digester.addSetNext("foo/bar", "addBar", "mypackage.Bar"); Foo foo = (Foo) digester.parse();
In order, these rules do the following tasks:
- When the outermost
<foo>
element is encountered, create a new instance ofmypackage.Foo
and push it on to the object stack. At the end of the<foo>
element, this object will be popped off of the stack. - Cause properties of the top object on the stack (i.e. the
Foo
object that was just created and pushed) to be set based on the values of the attributes of this XML element. - When a nested
<bar>
element is encountered, create a new instance ofmypackage.Bar
and push it on to the object stack. At the end of the<bar>
element, this object will be popped off of the stack (i.e. after the remaining rules matchingfoo/bar
are processed). - Cause properties of the top object on the stack (i.e. the
Bar
object that was just created and pushed) to be set based on the values of the attributes of this XML element. Note that type conversions are automatically performed (such as String to int for theid
property), for all converters registered with theConvertUtils
class fromcommons-beanutils
package. - Cause the
addBar
method of the next-to-top element on the object stack (which is why this is called the "set next" rule) to be called, passing the element that is on the top of the stack, which must be of typemypackage.Bar
. This is the rule that causes the parent/child relationship to be created.
Once the parse is completed, the first object that was ever pushed on to the
stack (the Foo
object in this case) is returned to you. It will
have had its properties set, and all of its child Bar
objects
created for you.
Processing A Struts Configuration File
As stated earlier, the primary reason that the
Digester
package was created is because the
Struts controller servlet itself needed a robust, flexible, easy to extend
mechanism for processing the contents of the struts-config.xml
configuration that describes nearly every aspect of a Struts-based application.
Because of this, the controller servlet contains a comprehensive, real world,
example of how the Digester can be employed for this type of a use case.
See the initDigester()
method of class
org.apache.struts.action.ActionServlet
for the code that creates
and configures the Digester to be used, and the initMapping()
method for where the parsing actually takes place.
(Struts binary and source distributions can be acquired at http://struts.apache.org/.)
The following discussion highlights a few of the matching patterns and processing rules that are configured, to illustrate the use of some of the Digester features. First, let's look at how the Digester instance is created and initialized:
Digester digester = new Digester(); digester.push(this); // Push controller servlet onto the stack digester.setValidating(true);
We see that a new Digester instance is created, and is configured to use a validating parser. Validation will occur against the struts-config_1_0.dtd DTD that is included with Struts (as discussed earlier). In order to provide a means of tracking the configured objects, the controller servlet instance itself will be added to the digester's stack.
digester.addObjectCreate("struts-config/global-forwards/forward", forwardClass, "className"); digester.addSetProperties("struts-config/global-forwards/forward"); digester.addSetNext("struts-config/global-forwards/forward", "addForward", "org.apache.struts.action.ActionForward"); digester.addSetProperty ("struts-config/global-forwards/forward/set-property", "property", "value");
The rules created by these lines are used to process the global forward
declarations. When a <forward>
element is encountered,
the following actions take place:
- A new object instance is created -- the
ActionForward
instance that will represent this definition. The Java class name defaults to that specified as an initialization parameter (which we have stored in the String variableforwardClass
), but can be overridden by using the "className" attribute (if it is present in the XML element we are currently parsing). The newActionForward
instance is pushed onto the stack. - The properties of the
ActionForward
instance (at the top of the stack) are configured based on the attributes of the<forward>
element. - Nested occurrences of the
<set-property>
element cause calls to additional property setter methods to occur. This is required only if you have provided a custom implementation of theActionForward
class with additional properties that are not included in the DTD. - The
addForward()
method of the next-to-top object on the stack (i.e. the controller servlet itself) will be called, passing the object at the top of the stack (i.e. theActionForward
instance) as an argument. This causes the global forward to be registered, and as a result of this it will be remembered even after the stack is popped. - At the end of the
<forward>
element, the top element (i.e. theActionForward
instance) will be popped off the stack.
Later on, the digester is actually executed as follows:
InputStream input = getServletContext().getResourceAsStream(config); ... try { digester.parse(input); input.close(); } catch (SAXException e) { ... deal with the problem ... }
As a result of the call to parse()
, all of the configuration
information that was defined in the struts-config.xml
file is
now represented as collections of objects cached within the Struts controller
servlet, as well as being exposed as servlet context attributes.
Parsing Body Text In XML Files
The Digester module also allows you to process the nested body text in an
XML file, not just the elements and attributes that are encountered. The
following example is based on an assumed need to parse the web application
deployment descriptor (/WEB-INF/web.xml
) for the current web
application, and record the configuration information for a particular
servlet. To record this information, assume the existence of a bean class
with the following method signatures (among others):
package com.mycompany; public class ServletBean { public void setServletName(String servletName); public void setServletClass(String servletClass); public void addInitParam(String name, String value); }
We are going to process the web.xml
file that declares the
controller servlet in a typical Struts-based application (abridged for
brevity in this example):
<web-app> ... <servlet> <servlet-name>action</servlet-name> <servlet-class>org.apache.struts.action.ActionServlet<servlet-class> <init-param> <param-name>application</param-name> <param-value>org.apache.struts.example.ApplicationResources<param-value> </init-param> <init-param> <param-name>config</param-name> <param-value>/WEB-INF/struts-config.xml<param-value> </init-param> </servlet> ... </web-app>
Next, lets define some Digester processing rules for this input file:
digester.addObjectCreate("web-app/servlet", "com.mycompany.ServletBean"); digester.addCallMethod("web-app/servlet/servlet-name", "setServletName", 0); digester.addCallMethod("web-app/servlet/servlet-class", "setServletClass", 0); digester.addCallMethod("web-app/servlet/init-param", "addInitParam", 2); digester.addCallParam("web-app/servlet/init-param/param-name", 0); digester.addCallParam("web-app/servlet/init-param/param-value", 1);
Now, as elements are parsed, the following processing occurs:
- <servlet> - A new
com.mycompany.ServletBean
object is created, and pushed on to the object stack. - <servlet-name> - The
setServletName()
method of the top object on the stack (ourServletBean
) is called, passing the body content of this element as a single parameter. - <servlet-class> - The
setServletClass()
method of the top object on the stack (ourServletBean
) is called, passing the body content of this element as a single parameter. - <init-param> - A call to the
addInitParam
method of the top object on the stack (ourServletBean
) is set up, but it is not called yet. The call will be expecting twoString
parameters, which must be set up by subsequent call parameter rules. - <param-name> - The body content of this element is assigned as the first (zero-relative) argument to the call we are setting up.
- <param-value> - The body content of this element is assigned as the second (zero-relative) argument to the call we are setting up.
- </init-param> - The call to
addInitParam()
that we have set up is now executed, which will cause a new name-value combination to be recorded in our bean. - <init-param> - The same set of processing rules are fired
again, causing a second call to
addInitParam()
with the second parameter's name and value. - </servlet> - The element on the top of the object stack
(which should be the
ServletBean
we pushed earlier) is popped off the object stack.
Namespace Aware Parsing
For digesting XML documents that do not use XML namespaces, the default
behavior of Digester
, as described above, is generally sufficient.
However, if the document you are processing uses namespaces, it is often
convenient to have sets of Rule
instances that are only
matched on elements that use the prefix of a particular namespace. This
approach, for example, makes it possible to deal with element names that are
the same in different namespaces, but where you want to perform different
processing for each namespace.
Digester does not provide full support for namespaces, but does provide sufficient to accomplish most tasks. Enabling digester's namespace support is done by following these steps:
- Tell
Digester
that you will be doing namespace aware parsing, by adding this statement in your initialization of the Digester's properties:digester.setNamespaceAware(true);
- Declare the public namespace URI of the namespace with which
following rules will be associated. Note that you do not
make any assumptions about the prefix - the XML document author
is free to pick whatever prefix they want:
digester.setRuleNamespaceURI("http://www.mycompany.com/MyNamespace");
- Add the rules that correspond to this namespace, in the usual way,
by calling methods like
addObjectCreate()
oraddSetProperties()
. In the matching patterns you specify, use only the local name portion of the elements (i.e. the part after the prefix and associated colon (":") character:digester.addObjectCreate("foo/bar", "com.mycompany.MyFoo"); digester.addSetProperties("foo/bar");
- Repeat the previous two steps for each additional public namespace URI
that should be recognized on this
Digester
run.
Now, consider that you might wish to digest the following document, using the rules that were set up in the steps above:
<m:foo xmlns:m="http://www.mycompany.com/MyNamespace" xmlns:y="http://www.yourcompany.com/YourNamespace"> <m:bar name="My Name" value="My Value"/> <y:bar id="123" product="Product Description"/>L </x:foo>
Note that your object create and set properties rules will be fired for the
first occurrence of the bar
element, but not the
second one. This is because we declared that our rules only matched
for the particular namespace we are interested in. Any elements in the
document that are associated with other namespaces (or no namespaces at all)
will not be processed. In this way, you can easily create rules that digest
only the portions of a compound document that they understand, without placing
any restrictions on what other content is present in the document.
You might also want to look at Encapsulated Rule Sets if you wish to reuse a particular set of rules, associated with a particular namespace, in more than one application context.
Using Namespace Prefixes In Pattern Matching
Using rules with namespaces is very useful when you have orthogonal rulesets. One ruleset applies to a namespace and is independent of other rulesets applying to other namespaces. However, if your rule logic requires mixed namespaces, then matching namespace prefix patterns might be a better strategy.
When you set the NamespaceAware
property to false, digester uses
the qualified element name (which includes the namespace prefix) rather than the
local name as the pattern component for the element. This means that your pattern
matches can include namespace prefixes as well as element names. So, rather than
create namespace-aware rules, create pattern matches including the namespace
prefixes.
For example, (with NamespaceAware
false), the pattern
'foo:bar'
will match a top level element named 'bar'
in the
namespace with (local) prefix 'foo'
.
Limitations of Digester Namespace support
Digester does not provide general "xpath-compliant" matching; only the namespace attached to the last element in the match path is involved in the matching process. Namespaces attached to parent elements are ignored for matching purposes.
Pluggable Rules Processing
By default, Digester
selects the rules that match a particular
pattern of nested elements as described under
Element Matching Patterns. If you prefer to use
different selection policies, however, you can create your own implementation
of the org.apache.commons.digester.Rules interface,
or subclass the corresponding convenience base class
org.apache.commons.digester.RulesBase.
Your implementation of the match()
method will be called when the
processing for a particular element is started or ended, and you must return
a List
of the rules that are relevant for the current nesting
pattern. The order of the rules you return is significant,
and should match the order in which rules were initially added.
Your policy for rule selection should generally be sensitive to whether
Namespace Aware Parsing is taking place. In
general, if namespaceAware
is true, you should select only rules
that:
- Are registered for the public namespace URI that corresponds to the prefix being used on this element.
- Match on the "local name" portion of the element (so that the document creator can use any prefix that they like).
ExtendedBaseRules
ExtendedBaseRules, adds some additional expression syntax for pattern matching to the default mechanism, but it also executes more slowly. See the JavaDocs for more details on the new pattern matching syntax, and suggestions on when this implementation should be used. To use it, simply do the following as part of your Digester initialization:
Digester digester = ... ... digester.setRules(new ExtendedBaseRules()); ...
RegexRules
RegexRules is an advanced Rules
implementation which does not build on the default pattern matching rules.
It uses a pluggable RegexMatcher implementation to test
if a path matches the pattern for a Rule. All matching rules are returned
(note that this behaviour differs from longest matching rule of the default
pattern matching rules). See the Java Docs for more details.
Example usage:
Digester digester = ... ... digester.setRules(new RegexRules(new SimpleRegexMatcher())); ...
RegexMatchers
Digester
ships only with one RegexMatcher
implementation: SimpleRegexMatcher.
This implementation is unsophisticated and lacks many good features
lacking in more power Regex libraries. There are some good reasons
why this approach was adopted. The first is that SimpleRegexMatcher
is simple, it is easy to write and runs quickly. The second has to do with
the way that RegexRules
is intended to be used.
There are many good regex libraries available. (For example Jakarta ORO, Jakarta Regex, GNU Regex and Java 1.4 Regex) Not only do different people have different personal tastes when it comes to regular expression matching but these products all offer different functionality and different strengths.
The pluggable RegexMatcher
is a thin bridge
designed to adapt other Regex systems. This allows any Regex library the user
desires to be plugged in and used just by creating one class.
Digester
does not (currently) ship with bridges to the major
regex (to allow the dependencies required by Digester
to be kept to a minimum).
Encapsulated Rule Sets
All of the examples above have described a scenario where the rules to be
processed are registered with a Digester
instance immediately
after it is created. However, this approach makes it difficult to reuse the
same set of rules in more than one application environment. Ideally, one
could package a set of rules into a single class, which could be easily
loaded and registered with a Digester
instance in one easy step.
The RuleSet interface (and the convenience base
class RuleSetBase) make it possible to do this.
In addition, the rule instances registered with a particular
RuleSet
can optionally be associated with a particular namespace,
as described under Namespace Aware Processing.
An example of creating a RuleSet
might be something like this:
public class MyRuleSet extends RuleSetBase { public MyRuleSet() { this(""); } public MyRuleSet(String prefix) { super(); this.prefix = prefix; this.namespaceURI = "http://www.mycompany.com/MyNamespace"; } protected String prefix = null; public void addRuleInstances(Digester digester) { digester.addObjectCreate(prefix + "foo/bar", "com.mycompany.MyFoo"); digester.addSetProperties(prefix + "foo/bar"); } }
You might use this RuleSet
as follow to initialize a
Digester
instance:
Digester digester = new Digester(); ... configure Digester properties ... digester.addRuleSet(new MyRuleSet("baz/"));
A couple of interesting notes about this approach:
- The application that is using these rules does not need to know anything
about the fact that the
RuleSet
being used is associated with a particular namespace URI. That knowledge is embedded inside theRuleSet
class itself. - If desired, you could make a set of rules work for more than one
namespace URI by providing constructors on the
RuleSet
to allow this to be specified dynamically. - The
MyRuleSet
example above illustrates another technique that increases reusability -- you can specify (as an argument to the constructor) the leading portion of the matching pattern to be used. In this way, you can construct aDigester
that recognizes the same set of nested elements at different nesting levels within an XML document.
Using Named Stacks For Inter-Rule Communication
Digester
is based on Rule
instances working together
to process xml. For anything other than the most trivial processing,
communication between Rule
instances is necessary. Since Rule
instances are processed in sequence, this usually means storing an Object
somewhere where later instances can retrieve it.
Digester
is based on SAX. The most natural data structure to use with
SAX based xml processing is the stack. This allows more powerful processes to be
specified more simply since the pushing and popping of objects can mimic the
nested structure of the xml.
Digester
uses two basic stacks: one for the main beans and the other
for parameters for method calls. These are inadequate for complex processing
where many different Rule
instances need to communicate through
different channels.
In this case, it is recommended that named stacks are used. In addition to the
two basic stacks, Digester
allows rules to use an unlimited number
of other stacks referred two by an identifying string (the name). (That's where
the term named stack comes from.) These stacks are
accessed through calls to:
- void push(String stackName, Object value)
- Object pop(String stackName)
- Object peek(String stackName)
Note: all stack names beginning with org.apache.commons.digester
are reserved for future use by the Digester
component. It is also recommended
that users choose stack names prefixed by the name of their own domain to avoid conflicts
with other Rule
implementations.
Registering DTDs
Brief (But Still Too Long) Introduction To System and Public Identifiers
A definition for an external entity comes in one of two forms:
SYSTEM system-identifier
PUBLIC public-identifier system-identifier
The system-identifier
is an URI from which the resource can be obtained
(either directly or indirectly). Many valid URIs may identify the same resource.
The public-identifier
is an additional free identifier which may be used
(by the parser) to locate the resource.
In practice, the weakness with a system-identifier
is that most parsers
will attempt to interpret this URI as a URL, try to download the resource directly
from the URL and stop the parsing if this download fails. So, this means that
almost always the URI will have to be a URL from which the declaration
can be downloaded.
URLs may be local or remote but if the URL is chosen to be local, it is likely only to function correctly on a small number of machines (which are configured precisely to allow the xml to be parsed). This is usually unsatisfactory and so a universally accessible URL is preferred. This usually means an internet URL.
To recap, in practice the system-identifier
will (most likely) be an
internet URL. Unfortunately downloading from an internet URL is not only slow
but unreliable (since successfully downloading a document from the internet
relies on the client being connect to the internet and the server being
able to satisfy the request).
The public-identifier
is a freely defined name but (in practice) it is
strongly recommended that a unique, readable and open format is used (for reasons
that should become clear later). A Formal Public Identifier (FPI) is a very
common choice. This public identifier is often used to provide a unique and location
independent key which can be used to substitute local resources for remote ones
(hint: this is why ;).
By using the second (PUBLIC
) form combined with some form of local
catalog (which matches public-identifiers
to local resources) and where
the public-identifier
is a unique name and the system-identifier
is an internet URL, the practical disadvantages of specifying just a
system-identifier
can be avoided. Those external entities which have been
store locally (on the machine parsing the document) can be identified and used.
Only when no local copy exists is it necessary to download the document
from the internet URL. This naming scheme is recommended when using Digester
.
External Entity Resolution Using Digester
SAX factors out the resolution of external entities into an EntityResolver
.
Digester
supports the use of custom EntityResolver
but ships with a simple internal implementation. This implementation allows local URLs
to be easily associated with public-identifiers
.
For example:
digester.register("-//Example Dot Com //DTD Sample Example//EN", "assets/sample.dtd");
will make digester return the relative file path assets/sample.dtd
whenever an external entity with public id
-//Example Dot Com //DTD Sample Example//EN
is needed.
Note: This is a simple (but useful) implementation.
Greater sophistication requires a custom EntityResolver
.
Troubleshooting
Debugging Exceptions
Digester
is based on SAX.
Digestion throws two kinds of Exception
:
java.io.IOException
org.xml.sax.SAXException
The first is rarely thrown and indicates the kind of fundamental IO exception that developers know all about. The second is thrown by SAX parsers when the processing of the XML cannot be completed. So, to diagnose the cause a certain familiarity with the way that SAX error handling works is very useful.
Diagnosing SAX Exceptions
This is a short, potted guide to SAX error handling strategies. It's not intended as a proper guide to error handling in SAX.
When a SAX parser encounters a problem with the xml (well, ok - sometime after it
encounters a problem) it will throw a
SAXParseException. This is a subclass of SAXException
and contains
a bit of extra information about what exactly when wrong - and more importantly,
where it went wrong. If you catch an exception of this sort, you can be sure that
the problem is with the XML and not Digester
or your rules.
It is usually a good idea to catch this exception and log the extra information
to help with diagnosing the reason for the failure.
General
SAXException instances may wrap a causal exception. When exceptions are
throw by Digester
each of these will be wrapped into a
SAXException
and rethrown. So, catch these and examine the wrapped
exception to diagnose what went wrong.
Frequently Asked Questions
- Why do I get warnings when using a JAXP 1.1 parser?
If you're using a JAXP 1.1 parser, you might see the following warning (in your log):
[WARN] Digester - -Error: JAXP SAXParser property not recognized: http://java.sun.com/xml/jaxp/properties/schemaLanguage
This property is needed for JAXP 1.2 (XML Schema support) as required for the Servlet Spec. 2.4 but is not recognized by JAXP 1.1 parsers. This warning is harmless. - Why Doesn't Schema Validation Work With Parser XXX Out Of The Box?
Schema location and language settings are often need for validation using schemas. Unfortunately, there isn't a single standard approach to how these properties are configured on a parser. Digester tries to guess the parser being used and configure it appropriately but it's not infallible. You might need to grab an instance, configure it and pass it to Digester.
If you want to support more than one parser in a portable manner, then you'll probably want to take a look at the
org.apache.commons.digester.parsers
package and add a new class to support the particular parser that's causing problems. - Help!
I'm Validating Against Schema But Digester Ignores Errors!
Digester is based on SAX. The convention for SAX parsers is that all errors are reported (to any registered
ErrorHandler
) but processing continues. Digester (by default) registers its ownErrorHandler
implementation. This logs details but does not stop the processing (following the usual convention for SAX based processors).This means that the errors reported by the validation of the schema will appear in the Digester logs but the processing will continue. To change this behaviour, call
digester.setErrorHandler
with a more suitable implementation. - Where Can I Find Example Code?
Digester ships with a sample application: a mapping for the Rich Site Summary format used by many newsfeeds. Download the source distribution to see how it works.
Digester also ships with a set of examples demonstrating most of the features described in this document. See the "src/examples" subdirectory of the source distribution.
- When Are You Going To Support Rich Site Summary Version x.y.z?
The Rich Site Summary application is intended to be a sample application. It works but we have no plans to add support for other versions of the format.
We would consider donations of standard digester applications but it's unlikely that these would ever be shipped with the base digester distribution. If you want to discuss this, please post to commons dev mailing list
Known Limitations
Accessing Public Methods In A Default Access Superclass
There is an issue when invoking public methods contained in a default access superclass.
Reflection locates these methods fine and correctly assigns them as public.
However, an IllegalAccessException
is thrown if the method is invoked.
MethodUtils
contains a workaround for this situation.
It will attempt to call setAccessible
on this method.
If this call succeeds, then the method can be invoked as normal.
This call will only succeed when the application has sufficient security privileges.
If this call fails then a warning will be logged and the method may fail.
Digester
uses MethodUtils
and so there may be an issue accessing methods
of this kind from a high security environment. If you think that you might be experiencing this
problem, please ask on the mailing list.
-
Interface Summary Interface Description Digester.GeneratedCodeLoader DocumentProperties A collection of interfaces, one per property, that enables the object being populated by the digester to signal to the digester that it supports the given property and that the digester should populate that property if available.DocumentProperties.Charset The character encoding used by the source XML document.ObjectCreationFactory Interface for use withFactoryCreateRule
.Rules Public interface defining a collection of Rule instances (and corresponding matching patterns) plus an implementation of a matching policy that selects the rules that match a particular pattern of nested elements discovered during parsing.RuleSet Public interface defining a shorthand means of configuring a complete set of relatedRule
definitions, possibly associated with a particular namespace URI, in one operation.SetPropertiesRule.Listener -
Class Summary Class Description AbstractObjectCreationFactory Abstract base class forObjectCreationFactory
implementations.ArrayStack<E> Imported copy of theArrayStack
class from Commons Collections, which was the only direct dependency from Digester.CallMethodRule Rule implementation that calls a method on an object on the stack (normally the top/parent object), passing arguments collected from subsequentCallParamRule
rules or from the body of this element.CallParamRule Rule implementation that saves a parameter for use by a surroundingCallMethodRule
.Digester A Digester processes an XML input stream by matching a series of element nesting patterns to execute Rules that have been added prior to the start of parsing.EnvironmentPropertySource AIntrospectionUtils.SecurePropertySource
that uses environment variables to resolve expressions.FactoryCreateRule Rule implementation that uses anObjectCreationFactory
to create a new object which it pushes onto the object stack.ObjectCreateRule Rule implementation that creates a new object and pushes it onto the object stack.Rule Concrete implementations of this class implement actions to be taken when a corresponding nested pattern of XML elements has been matched.RulesBase Default implementation of theRules
interface that supports the standard rule matching behavior.ServiceBindingPropertySource AIntrospectionUtils.SecurePropertySource
that uses Kubernetes service bindings to resolve expressions.SetNextRule Rule implementation that calls a method on the (top-1) (parent) object, passing the top object (child) as an argument.SetPropertiesRule Rule implementation that sets properties on the object at the top of the stack, based on attributes with corresponding names.SystemPropertySource AIntrospectionUtils.SecurePropertySource
that uses system properties to resolve expressions.