Xerces, XmlBeans and XML Schema Import Resolver

April 30, 2010

I’ve been developing a framework to provide XML validation/assurance tools based on a range of parser/validator implementations in Java. A recent experience warranted flagging, given I’ve been surprised at the number or credible examples out there on yonder interweb.

The Headline

XmlBeans 2.5 and Xerces 2.9 behave differently when using a runtime schema/resource resolver. XmlBeans relies on the use of the EntityResolver interface whereas Xerces relies on the LSResourceResolver interface, but the way the XMLSchema <import /> statement is put together causes some consistency problems in terms of how these resolvers are used…

The Goal

Take a WSDL (with inline-schema) and then use that WSDL to validate XML instance documents. You cannot validate directly with a WSDL, and therefore as a developer you spend time messing with tools to extract schema and so forth. This tool will greatly simplify this process among other things. This approach involves unpacking the inline XML Schema documents in such a way that multiple XSD fragments can be utilised at the point of validation.

The Approach

Open the WSDL, extract the schemas taking care to maintain the namespace declarations from the outer WSDL definitions section now that XSD’s will become independent XSD files and unable to inherit through context. Spin up a version of XmlBeans and Xerces, and generate a n-way schema validation report this catering for variations in the level of compliance in any one validating parser.

The Problem

Some of the WSDL documents are very complex. They contain many, sometimes hundreds of hierarchically related XML Schema fragments, each of which ‘imports’ the namespaces of numerous other schema fragments. Clearly when I separate the XSD’s from the WSDL these imports cannot resolve through inherited context so I have to make sure that I can resolve these at runtime when the validating parsers are fed the root schema. Problem is the inline schemas in the WSDL contain only a namespace import:

<import namespace=”http://some.other.namespace&#8221; />

And when this is exported to a standalone XSD file, you need to use a runtime SchemaResolver to acquire the other related XSD files such that the XmlBeans or Xerces engines can assimilate all of the type and namespace set necessary to validate the specified XML instance document. I spent a lot of time diagnosing why my XmlBeans implementation was not triggering the necessary ‘tell me where this schema lives?‘ events, such that element references were remaining unresolved, and the validation of the XML instance document was failing. With exactly the same input artifacts the Xerces resolver was happily going about it’s business and allowing me to feed it schema fragments all the way home.

So Watch Out For This…

If you are using XmlBeans, and you want to use a root XMLSchema to validate an XML document, but if the root XMLSchema contains <import /> statements, then you must have a schemaLocation present.

<import namespace=”http://some.other.namespace&#8221;  schemaLocation=”file:/some.where”/>

Only with the schemaLocation present will the core XmlBeans.compileXsd() function call out to your custom EntityResolver such that you can then map the requested ‘namespace’ to a physical input source. On the other hand the Xerces parser will work with the shorter format which is already present in my

<import namespace=”http://some.other.namespace&#8221; />

Now the only problem here is that I did not want to modify the XMLSchema artifacts I was extracting from the WSDL on the basis that I am offering a service based on the inputs supplied. However, I have been forced into a position of needing to inject a schemaLocation tag into all the XSD’s as I’m extracting them from the parent WSDL. That said I am not specifying a physical location for the schemaLocation, merely reiterating the same generic namespace. I do this to just force the trigger of the resolver events, from which I then use only the supplied namespace=”http://some.other.namespace&#8221; tag in conjunction with some context information that I know in relation to where I put the schemas in my file-system, to resolve and supply the linkage to the required schema. This is sufficiently generic that I don’t deem it to be overly intrusive into the base schema, and I end up with the following modified import statements:

<import namespace=”http://external.namespace&#8221;  schemaLocation=”http://external.namespace”/&gt;

With this format I get uniform behaviour between XmlBeans and Xerces. Both call out to my runtime resolvers which are then able to use a simple mapping scheme to supply the schema content and I now able to operate consistently.

This did take me a long time to diagnose, so hopefully it will be of use to others.

REST – Am I moving from ruby to java…?

December 21, 2007

So I’ve concluded some deep investigations around RESTful service applied to the big nasty Enterprise. I’ve been easing myself back into hacksville with Windows/Ruby/Rails/MySQL and have had an altogeher positive experience. However I’m now at a point of considering my next steps – do I go forward on Ruby/Rails or do I take this opportunity to hop onto an alternative framework.

The answer to this relates to the nature of the challenge in applying REST to the Enteprise as opposed to RESTifying a database or application instance. In the former the primary challenge relates to the formation of resoruces from potentially complex components – not atomic records or aggregations of information from a local-datasource (I’m using a trivial scenarion here as an example – not disresepcting restful applications !!).

 I have hit problems with managing synchronous and asynchronous RESTful interactions within rails to date, and these challenges are purely down to the complex linkage beween a RESTful presentation and a complex implementation landscape.

Whilst this kind or perspective may be off the mainstream from a rails-applcations perspective, it’s core to my world – and as such I benefit less from the excellent front-end rails conventions, given I’m assembling resource representations which don’t exist in an atomic sense, in a local DB.

So the key issues I now face in my next iteration (and the Rails/Java JSR311 question) are:

  1. I need a more seamless mechanism for managing undeterministic concurrency (by virtue of possibly long-running resource operations), and where dynamic threading seems cleaner than pre-defining a mongrel cluster for a known concurrency limit up front – along with a fron-end load-balancer configuration.
  2.  I need to introduce efficient, selective asynchronous interactions based around the 202-Accepted response, enabling the resource server to work on an operation whilst pushing a client away with a URI.
  3. XML Schema and validation support. I understand, to some extent, the position offered by the Ruby/REXML community in that XML Schema is over complicated and useless….and not worth implementing within the REXML library. However XML Schema is a mainstream currency in my integration (Enterprise) landscape and as such I have to deal with it. The lack of support in ruby has hampered my efforts to date.

I stress again that this is not a gripe about ruby/rails by any means – simply an observation that my Enterprise/REST context is emphasising certain integration challenges which may not be front-and-centre in the emerging waves of RESTful applications successfully launching on Rails.

So I’m looking at Jersey (JSR-311 Reference Implementation) and Noelios Restlet as options for my next leap. Both options will cover all 3 of my pinch points (I believe), and the thing I’m unsure about is the extent to which I’ll be exposed on the presentation stuff which the rails convention eats for lunch….

An interesting junction….