Xerces, XmlBeans and XML Schema Import Resolver

April 30, 2010

I’ve been developing a framework to provide XML validation/assurance tools based on a range of parser/validator implementations in Java. A recent experience warranted flagging, given I’ve been surprised at the number or credible examples out there on yonder interweb.

The Headline

XmlBeans 2.5 and Xerces 2.9 behave differently when using a runtime schema/resource resolver. XmlBeans relies on the use of the EntityResolver interface whereas Xerces relies on the LSResourceResolver interface, but the way the XMLSchema <import /> statement is put together causes some consistency problems in terms of how these resolvers are used…

The Goal

Take a WSDL (with inline-schema) and then use that WSDL to validate XML instance documents. You cannot validate directly with a WSDL, and therefore as a developer you spend time messing with tools to extract schema and so forth. This tool will greatly simplify this process among other things. This approach involves unpacking the inline XML Schema documents in such a way that multiple XSD fragments can be utilised at the point of validation.

The Approach

Open the WSDL, extract the schemas taking care to maintain the namespace declarations from the outer WSDL definitions section now that XSD’s will become independent XSD files and unable to inherit through context. Spin up a version of XmlBeans and Xerces, and generate a n-way schema validation report this catering for variations in the level of compliance in any one validating parser.

The Problem

Some of the WSDL documents are very complex. They contain many, sometimes hundreds of hierarchically related XML Schema fragments, each of which ‘imports’ the namespaces of numerous other schema fragments. Clearly when I separate the XSD’s from the WSDL these imports cannot resolve through inherited context so I have to make sure that I can resolve these at runtime when the validating parsers are fed the root schema. Problem is the inline schemas in the WSDL contain only a namespace import:

<import namespace=”http://some.other.namespace&#8221; />

And when this is exported to a standalone XSD file, you need to use a runtime SchemaResolver to acquire the other related XSD files such that the XmlBeans or Xerces engines can assimilate all of the type and namespace set necessary to validate the specified XML instance document. I spent a lot of time diagnosing why my XmlBeans implementation was not triggering the necessary ‘tell me where this schema lives?‘ events, such that element references were remaining unresolved, and the validation of the XML instance document was failing. With exactly the same input artifacts the Xerces resolver was happily going about it’s business and allowing me to feed it schema fragments all the way home.

So Watch Out For This…

If you are using XmlBeans, and you want to use a root XMLSchema to validate an XML document, but if the root XMLSchema contains <import /> statements, then you must have a schemaLocation present.

<import namespace=”http://some.other.namespace&#8221;  schemaLocation=”file:/some.where”/>

Only with the schemaLocation present will the core XmlBeans.compileXsd() function call out to your custom EntityResolver such that you can then map the requested ‘namespace’ to a physical input source. On the other hand the Xerces parser will work with the shorter format which is already present in my

<import namespace=”http://some.other.namespace&#8221; />

Now the only problem here is that I did not want to modify the XMLSchema artifacts I was extracting from the WSDL on the basis that I am offering a service based on the inputs supplied. However, I have been forced into a position of needing to inject a schemaLocation tag into all the XSD’s as I’m extracting them from the parent WSDL. That said I am not specifying a physical location for the schemaLocation, merely reiterating the same generic namespace. I do this to just force the trigger of the resolver events, from which I then use only the supplied namespace=”http://some.other.namespace&#8221; tag in conjunction with some context information that I know in relation to where I put the schemas in my file-system, to resolve and supply the linkage to the required schema. This is sufficiently generic that I don’t deem it to be overly intrusive into the base schema, and I end up with the following modified import statements:

<import namespace=”http://external.namespace&#8221;  schemaLocation=”http://external.namespace”/&gt;

With this format I get uniform behaviour between XmlBeans and Xerces. Both call out to my runtime resolvers which are then able to use a simple mapping scheme to supply the schema content and I now able to operate consistently.

This did take me a long time to diagnose, so hopefully it will be of use to others.


Spring MVC: Serving Static and Dynamic Content

April 6, 2010

I have been revisiting SpringMVC recently as a basis for creating a web application. I had a positive experience some time ago building out a RESTful abstraction layer across a range of disparate data-sources. primarily for programmatic access, so I wanted to build on that learning and create a fully featured web-app this time around.

Whilst I really like the Spring MVC framework, I am completely amazed that in some cases it is almost impossible to find simple explanations about relatively simple things involved in building out a web-application. This post is motivated by that lack of clarity, and after a few days of googling, I now believe I’ve evolved at least one of the myriad common-sense approaches to serving up both static and dynamic content from a Spring MVC applicaiton.

Firstly my environment:

Mac OS/X 10.6
Java 1.6.0_17
Spring 2.5.5
Tomcat 6.0.24

Secondly the goal was to create a simple presentation layer with a largely static html homepage, with embedded images, with links to Spring controllers which would deliver dynamic content through JSP’s. When I started down this road I had assumed this would be pretty common, but my experiences on google have proven that wrong so here goes…

As a backdrop for my own variations I used the very helpful article “Developing a Spring MVC Application step-by-step” which is great, but lulled me into a false sense of security before dropping me flat just as the real questions started emerging. I followed all steps in that tutorial and progressed very quickly to section 2, at which point I stopped, because I was less concerned with the addition of business logic, service PoJo’s and a persistence tier, than I was about breaking away from the text only presentation layer used in the example. So I have to point out – I’m not a presentation guy, I rarely move into the front-end discipline, but in this situation I have to – and as a result my inclination is immediatley to want to know how this static/dynamic stuff works.

So at section 2.1 in the tutorial I started to feel uneasy! Before I delve into this in detail let me backtrack.

1. I have a basic project structure as per the tutorial, with a web.xml and a <servlet>-context.xml for my primary controller. The relationship between these config files is covered in detail in the tutorial.

2. All of my JSP‘s live in the WEB-INF/jsp sub-folder of my source tree, and are packaged into that location in the WAR (I cover how these are located when I explain my view resolver later in the article) I deploy into tomcat.

3. I have a single index.jsp in the war/ root folder, and this index.jsp is referred to in my web.xml as follows:

<servlet>
<servlet-name>frontEndController</servlet-name>
<servlet-class>
org.springframework.web.servlet.DispatcherServlet
</servlet-class>
<load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>frontEndController</servlet-name>
<url-pattern>*.htm</url-pattern>
</servlet-mapping>
<welcome-file-list>
<welcome-file>
index.jsp
</welcome-file>
</welcome-file-list>

No surprises here – I’m pushing all the HTM URL’s entering my application context (i.e. requests arriving at Tomcat for host/port/context-root) towards my frontEndController which I’ll come-back to in a short while, but more specifically you’ll see that I’m using the *.htm naming convention in my presentation layer for all the links into my application. In other words I’m not allowing users to directly reference *.JSP files in my app – I could use any suffix for that matter (*.private for example) given it is only relevant for routing requests for content into my controller so my MVC engine can deal with it.

Now – I did have a static HTML index.html as my welcome page in the WAR root folder from where it was directly accessible by Tomcat. As such I could type http://localhost:8080/myapp/index.html&#8217; and get my page back ok, but there is a Spring MVC/JSP convention for not doing this – but instead redirecting the request for the ‘welcome’ page into the MVC/JSP engine such that it becomes instrumented and visible to the spring context. This is covered in section 2.1 of the tutorial – but for the record my war/index.html is now war/index.jsp, and it contains the following:

<%@ include file=”/WEB-INF/jsp/include.jsp” %>
<%– Redirected – we can’t set the welcome page to a virtual URL. –%>
<c:redirect url=”/index.htm”/>
where include.jsp is the inclusion of the necessary JSP tag libraries to support features such as this redirecting:

<%@ page session=”false”%>
<%@ taglib prefix=”c” uri=”http://java.sun.com/jsp/jstl/core&#8221; %>
<%@ taglib prefix=”fmt” uri=”http://java.sun.com/jsp/jstl/fmt&#8221; %>

So what we have here is a request to my application for ‘/’ or ‘index.jsp’ will redirect to a request for virtual ‘index.htm’ page. This forces the request into my SpringMVC controller ‘frontEndController’ on the basis of the *.htm pattern-match. At that point I simply wanted to return a JSP, containing html source, but also containing static images. Try as I might I could not find a simple explanation of how to achieve this !!

So here it is. Firstly – only requests for dynamic content are be pushed into the frontEndController. The *.htm pattern is a ‘codename’ in my application context for ‘give me some JSP content’. Fine. But what about the statics that those dynamics might require?? !!WATCH-OUT!! I used a /ui/* pattern initially in my <servlet>-context.xml which caused me problems with static content. Any embedded CSS or IMG resources also fall under that path name, and as such all requests hitting tomcat for /ui/page.htm or /ui/img/banner.jpg were being pushed into my frontEndController – from where I could not serve the images, nor did I want to. As such changing to the *.htm pattern means that only the htm page names are pushed in the MVC controller and all embedded resources can be managed separately.

There are 3 key messages here. Firstly my index.jsp in the WEB-INF is not really dynamic (other than I have used a templating solution to assemble my pages but that is irrelevant here). But by handling it in this way I have ensured that ‘all’ html content is generated/served under the control of my MVC engine – and therefore visible to all the nice things such as logging, access control, and all the other stuff I aint even thought of yet that I’m able to implement as cross-cutters in my MVC context. I don’t have a blind-spot where my static content may be beign hammered yet I’m seeing low usage on my MVC generated content for example.

Secondly you do NOT have to implement a custom controller for JSP’s which ARE largely static content. The normal flow is that the <servlet>-context.xml for the relevant controller declated in web.xml offers a second level of routing. Normal convention would be to add an entry such as:

<bean>
<property name=”urlMap”>
<map>
<entry key=”/index.htm” value-ref=”homepageController”></entry>
</map>
</property>
</bean>

which pushed such requests to the POJO controller declared in that same XML file as below:

<bean name=”homepageController”
class=”com.myapp.web.HomepageController” />

But this forces me to implement a java object HomepageController to effectively do ‘nothing’ other than return a ‘view’name of ‘index.jsp’ which then returns the ‘static’ index.jsp to the client. Have no fear there is a better way – and we use something called the UrlFilenameViewController. As such – for any ‘pages’ that I want to serve in my MVC engine, which are static – I can re-wire the request as follows:

<bean>
<property name=”urlMap”>
<map>
<entry key=”/index.htm” value-ref=”urlFilenameViewController” />
<entry key=”/static-about.htm” value-ref=”urlFilenameViewController” />
<entry key=”/termsandconditions.htm” value-ref=”urlFilenameViewController” />
</map>
</property>
</bean>
<!– For direct mapping between URL (i.e. index.htm -> index) and the JSP to render –>
<bean id=”urlFilenameViewController”
class=”org.springframework.web.servlet.mvc.UrlFilenameViewController”/>

By convention this means that I would need to have 3 JSP files in my WEB-INF/jsp index.jsp, static-about.jsp, and termsandconditions.jsp. The urlFilenameViewController simply converts the requested html resource name into a token, which is passed to the view-resolver in the same servlet context, which then needs to be configured to look in the WEB-INF/jsp folder as such:

<bean id=”viewResolver” class=”org.springframework.web.servlet.view.InternalResourceViewResolver”>
<property name=”viewClass” value=”org.springframework.web.servlet.view.JstlView” />
<property name=”prefix” value=”/WEB-INF/jsp/” />
<property name=”suffix” value=”.jsp” />
</bean>

Hey presto ! We now have a consistent mechanism to serve up static html content as JSP from within the SpringMVC engine. I like symmetry and as such like to have my static and dynamic pages all being managed in a consistent way – but there may be plenty of counter arguments to suggest this approach has problems. (If so I’m very keen to hear them !).

Secondly the static content such as images, and CSS for example, should NOT be inside the WEB-INF folder, but instead such static resources referenced by the returned JSP pages, should live in the WAR root folder ‘where Tomcat can serve them directly’ to the user outside of the controller back-end. As such I have a war/img, and war/css folder alongside my war/WEB-INF folder. Any html content I generate from my JSP handling framework refers to static resources such as:

<img src=”img/application-logo.jpg” alt=”some text”/>

which means that User-Agents will resolve the address for the subsidiary resources to:

http://host/applicaiton-path/img/application-logo.jpg

and this means that Tomcat can serve up those resources directly from the application without touching the SpringMVC engine. This now means that I have a seamless framework for taking all page requests into my controller architecture (even pages that may be largely static which I still want to be managed in a consistent way), and storing all supporting collateral in a simple place from where they can be served up.

This may seem like trivial or common-sense but it has taken me a long time to grind through the frustrating combination of too many option combinations offered by SpringMVC framework, and a lack of clear explanations on how to achieve this effectively including images or other static resources.

Hope this is useful.