XML, TrafficScript™ and Java Extensions

ZXTM 5.0 allows you to inspect and manipulate both incoming and outgoing traffic with a customized version of Java's Servlet API. In this article we'll delve more deeply into some of the semantics of ZXTM's Java Extensions and show how to validate XML files in up- and in downloads using TrafficScript™ and Java Extensions.

For more background material on ZXTM's Java Extensions please consult the documents linked to at the end of this article.

The example that will allow us to illustrate the use of XML processing by ZXTM is a website that allows users to share music play-lists. We'll first look at the XML capabilities of 'conventional' TrafficScript and then investigate the use of Java Extensions which have been introduced with ZXTM 5.0.

A play-list sharing web site

You have spent a lot of time developing a fancy website where users can upload their personal play-lists, making them available to others who can then search for music they like and download it. Of course you went for XML as the data format, not least because it allows you to make sure uploads are valid. Therefore, you can help your users' applications by providing them with a way of validating their XML-files as they are uploaded. Also, to be on the safe side, whenever an application downloads an XML play-list it should be checked and only reach the user if it passes the validation.

XML provides the concept of schema files to describe what a valid document has to look like. One popular schema language is the W3C's XML Schema Definition (XSD), see http://www.w3.org/TR/xmlschema-0/. Given an XSD file, you can hand an XML document to a validator to find out whether it actually conforms to the data structure specified in the schema.

Coming back to our example of a play-list sharing website, you have downloaded the popular xspf (XML Shareable Playlist Format, 'spiff') schema description from http://xspf.org/validation/xspf-1_0.2.xsd.
One of the tags allowed inside a track in XML files of this type is image. By specifying tags like <image>http://images.amazon.com/images/P/B000002J0B.01.MZZZZZZZ.jpg</image> a user could see the following pictures:

RemastersIIIHouses of the HolyIV

Validating XML with TrafficScript™

How do you validate an XML file from a user against that schema? ZXTM's TrafficScript actually has provided this functionality since version 2.0 with the xml.validate() function. Here's a simple rule to check the response of a web server against a XSD:

$doc = http.getResponseBody( 0 );
$schema = resource.get( "xspf.xsd" );
$result = xml.validate.xsd( $doc, $schema );
if( 1 == $result ) {
   log.info( "Validation succeeded" );
} else if( 0 == $result ) {
   log.info( "Validation failed" );
} else {
   log.info("Validation error");
}

Let's have a closer look at what this rule does: First, it reads in the whole response by calling http.getResponseBody(). This function takes a single argument specifying the amount of data to read in. By passing this argument as zero, we tell ZXTM to give us the whole response. This function is very practical but you have to be extremely careful with it. The reason is that you do not know beforehand how big the response actually is. It might be an audio stream totaling many hundred megabytes in size. Surely you don't want ZXTM to buffer all that data. Therefore, when using http.getResponseBody(), in particular with a size argument of zero, you should always check the mime type and the content length of the response (see below for code that does this).

Our rule then goes on to load the schema definition file with resource.get(), which must be located in $ZEUSHOME/zxtm/extra/ for this step to work. Finally it does the actual validation and checks the result. In this simple example, we are only logging the result, on your music-sharing web site you would have to take the appropriate action.

The last rule was a response rule that worked on the result from the back-end web server. These files are actually under your control (at least theoretically), so validation is not that urgent. Things are different if you allow uploads to your web site. Any user-provided data must be validated before you let it through to your back-ends. The following request rule does the XML validation for you:

$m = http.getMethod();
if( 0 == string.icmp( $m, "POST" ) ) {
   $clen = http.getHeader("Content-Length");
   if( $clen > 0 && $clen <= 1024*1024 ) {
      $schema = resource.get( "xspf.xsd" );
      $doc = http.getBody( 0 );
      $result = xml.validate.xsd( $doc, $schema );
      # handle result
   } else {
      # handle over-sized posts
   }
}

Note how we first look at the HTTP method, then retrieve the length of the post's body and check it. That check, which is done in the line

if( $clen > 0 && $clen <= 1024*1024 ) {

deserves a bit more comment: The variable $clen was initialized from the post's Content-Length header, so it could be the empty string at that stage. When TrafficScript converts data to integers, variables that do not actually represent numbers are converted to 0 (see the TrafficScript reference for more details). Therefore, we have to check that $clen is greater than zero and at most 1 megabyte (or whatever limit you choose to impose on the size of uploads). After checking the content length we can safely invoke getBody(0)¹.

Validating XML with ZXTM's Java Extensions

After having explored TrafficScript's built-in XML support, let's now see how XML validation can be done using Java Extensions.

If you are at all familiar with Java Servlets, ZXTM's Java Extensions should feel like home for you. The main differences are

  • You have a lot of ZXTM's built-in functionality ready at hand via attributes.
  • You can manipulate both the response (as in conventional Servlets) and the request (unique to ZXTM's Servlet Extensions as ZXTM sits between the client and the server).

The following diagram shows the flow of data when a ZXTM Java Extension is used:

Java Extensions

The interesting thing here is that this flow actually applies twice: First when the request is sent to the server (you can invoke the Java extension from a request rule) and then again when the response is sent back to the client (allowing you to change the result from a response rule). This is very practical for your music-sharing web site as you only have to write one Servlet. However, you have to be able to tell whether you are working on the response or the request. The ZXTMHttpServletResponse object which is passed to both the doGet() and doPost() methods of the HttpServlet object has a method to find out which direction of the traffic flow you are currently in: boolean isResponseRule(). This distinction is never needed in conventional Servlet programming as in that scenario it's the Servlet's task to create the response, not to modify an existing response.

These considerations make it easy to design the ZXTM Servlet for your web site:

  1. There will be an init() method to read in the schema definition and to set up the xml.validation.Validator object.
  2. We'll have a single private validate() method to do the actual work.
  3. The doGet() method will invoke validate() on the server's response, whereas
  4. the doPost() method does the same on the body of the request

After all that theory it's high time for some real code (note that any import directives have been removed for the sake of readability as they don't add anything to our discussion):

public class XmlValidate extends HttpServlet {
   private static final long serialVersionUID = 1L;
   private static Validator validator = null;
   public void init( ServletConfig config ) throws ServletException {
      super.init( config );
      String schema_file = config.getInitParameter("schema_file");

      if( schema_file == null )
         throw new ServletException("No schema file specified");

      SchemaFactory factory = SchemaFactory.newInstance(
         XMLConstants.W3C_XML_SCHEMA_NS_URI);

      Source schemaFile = new StreamSource(new File(schema_file));
      try {
         Schema schema = factory.newSchema(schemaFile);
         validator = schema.newValidator();
      } catch( SAXException saxe ) {
         throw new ServletException(saxe.getMessage());
      }
   }
// ... other methods below
}

The validate() function is actually very simple as all the hard work is done inside the Java library. The only thing to be careful about is to make sure that we don't allow concurrent access to the Validator object from multiple threads:

   private boolean validate( InputStream in, HttpServletResponse res, String errmsg )
      throws IOException
   {
      Source src = new StreamSource(in);
      try {
         synchronized( validator ) {
            validator.validate(src);
         }
      } catch( SAXException saxe ) {
         String msg = saxe.getMessage();
         res.setContentType("text/plain");
         PrintWriter out = res.getWriter();
         out.println(errmsg);
         out.print("Validation of the xml file has failed with error message: ");
         out.println(msg);
         return false;
      }
      return true;
   }

Note that the only thing we have to do in case of a failure is to write to the stream that makes up the response. No matter whether this is being done in a request or a response rule, ZXTM will take that as an indication that this is what should be sent back to the client. In the case of a request rule, ZXTM won't even bother to hand on the request to a back-end server and instead send the result of the Java Servlet; in a response rule, the server's answer will be replaced by what the Servlet has produced.

Now we are ready for the doGet() method:

   public void doGet( HttpServletRequest req, HttpServletResponse res )
      throws ServletException, IOException
   {
      try {
         ZXTMHttpServletResponse zres = (ZXTMHttpServletResponse) res;
         if( !zres.isResponseRule() ) {
            log("doGet called in request rule ... bailing out");
            return;
         }

         InputStream in = zres.getInputStream();
         validate(in, zres, "The file you requested was rejected.");
      } catch( Exception e ) {
         throw new ServletException(e.getMessage());
      }
   }

There's not really much work left apart from calling our validate() method with the error message to append in case of failure. As discussed previously, we make sure that we are actually working in the context of a response rule because otherwise the response would be empty. Exactly the opposite has to be done when processing a post:

   public void doPost( HttpServletRequest req, HttpServletResponse res )
      throws ServletException, IOException
   {
      try {
         ZXTMHttpServletRequest zreq = (ZXTMHttpServletRequest) req;
         ZXTMHttpServletResponse zres = (ZXTMHttpServletResponse) res;

         if( zres.isResponseRule() ) {
            log("doPost called in response rule ... bailing out");
            return;
         }

         InputStream in = zreq.getInputStream();
         if( validate(in, zres, "Your upload was unsuccessful") ) {
            // just let the post through to the backends
         }
      } catch(Exception e) {
         throw new ServletException(e.getMessage());
      }
   }

The only thing missing are the rules to invoke the Servlet, so here they are (assuming that the Servlet has been loaded up via the 'Java' tab of the 'Catalogs' section in ZXTM's UI as a file called XmlValidate.class). First the request rule:

$m = http.getMethod();
if( 0 == string.icmp( $m, "POST" ) ) {
   java.run( "iXml.XmlValidate" );
}

and the response rule is almost the same:

$m = http.getMethod();
if( 0 == string.icmp( $m, "GET") ) {
   java.run( "iXml.XmlValidate" );
}


It's your choice: TrafficScript™ or Java Extensions

So now you are left with a difficult decision: you have two implementations of the same functionality, which one do you choose? Bearing in mind that the unassuming java.run() leads to a considerable amount of inter-process communication between the ZXTM child process and the Java Servlet runner, whereas the xml.validate() is handled in C++ inside the same process, it is a rather obvious choice. But there are still situations when you might prefer the Java solution. One example would be that you have to do XML processing not supported directly by ZXTM. Java is more flexible and complete in the XML support it provides. But there is another advantage to using Java: you can replace the actual implementation of the XML functionality. You might want to use Intel's XML Software SuiteJ for Java, as provided here. But how do you tell ZXTM's Java runner to use another XML library? Only two settings have to be adapted:

java!classpath	/opt/intel/xmlsoftwaresuite/java/1.0/lib/intel-xss.jar
java!command	java -Djava.library.path=/opt/intel/xmlsoftwaresuite/java/1.0/bin/intel64 -server

This applies if you have installed Intel's XML Software SuiteJ in /opt/intel/xmlsoftwaresuite/java/1.0/ and are using the 64 bit version of the shared library. Both changes can be made in the 'Global Settings' tab of the 'System' section in ZXTM's UI.

Further Reading

Legal Notice

The case of a play-list sharing site has been chosen as an example only and purely for illustrative purposes. Zeus Technology strongly recommends that you take legal advice before putting copyrighted material on the internet.



¹A malicious user might have faked the HTTP header to specify a length larger than his actual post. This would lead ZXTM to try to read more data than the client sends, using a file descriptor until the connection times out. Due to ZXTM's non-blocking IO multiplexing, however, other requests would be processed normally.Back.

michael [Zeus Dev Team] 18 June 2008  Permalink 1 comment  

Comments:

This public messageboard is not a forum for technical support. To report technical support problems, please contact our dedicated Support team using the instructions at the bottom of this page.

Comment from: liwendong [Visitor] · http://Thankes for it
It is helpfull for me.My choice is correct.Thanks very much.
Permalink 25 June 2008 @ 09:09
Leave a comment ...
Your email address will not be displayed.
Your URL will be displayed.
This public messageboard is not a forum for technical support. To report technical support problems, please contact our dedicated Support team using the instructions at the bottom of this page.
Options:
 
(Line breaks become <br />)
(Set cookies for name, email & url)
Download Free ZXTM Desktop Edition

Recent Articles

Other Resources



www.zeus.com