Xerces DOMParser funny…

May 1, 2010

Working with Xerces DOMParser I have noticed a weird exception being reported arbitrarily by certain instance documents. This exception message is “The processing instruction target matching “[xX][mM][lL]” is not allowed.“. Looking at the XML source file it does have the normal <?xml version=”1.0″ encoding=”UTF-8″?> prefix so was initially unsure why this exception was firing every now and then?

Turns out that using the Xerces (JDK default) DOMParser  implementation, there is a parser exception ONLY if there are blank lines in the XML document before the <?xml version=”1.0″ encoding=”UTF-8″?> declaration..??

Removing these blank lines (NOT the XML declaration header) allows the parser to complete successfully. I don’t profess to understand the science bit here, but this did knock me off track for a while. Interested to hear what relevance the blank lines do have in a document where whitespace is irrelevant…?


Ruby and XML Schema…??

December 18, 2007

So there I was looking for the REXML::Document.validate_with_xsd() routine, persuading myself that it must be there somewhere, when suddenly I came to the realisation that it wasn’t ! Eh?!

I then happened upon numerous blogs and chat threads clearly explaining why the collective conscience of ‘Ruby’ had thus far deemed XML Schema unfit for inclusion in this ever expanding scripting language, because….well XSD is crap! Woo hoo I’m glad that was settled so convincingly !

Excellent I thought, if only I were working in an idealogical vacuum where XML Schema had been out-lawed years ago, but sadly no, the primary currency of integration in my world is XML declared primarily with XSD and supporting semantic information. I might not ‘like’ it, but it’s there….so I need to exploit it to avoid reinventing it….don’t I ? Or else what did I miss?

I noted with interest the justification that to effectively use XSD within Ruby, and make sense of a document validator revolving around schema, I’d have to write just as much reactive code as I would have to do if I just coded the document-specific validation routines by hand. Hmmm…not sure….I think ‘I’ would write a lot MORE code if I attempted to do that…than say the seasoned hackers who’re making such assertions with their zen-like-one-ness with the syntax.

REXML gives a structural integrity validation in a single line of code:

valid_doc=REXML::Document.new(xml_src)

But whilst this gives me a warm feeling that I’m not parsing an alien binary artefact, it doesn’t give me much of an insight in terms of whether my structurally intact XML document actually manifests any of the rules/constraints laid out in the existing XSD’s that my organisation uses to declare at least some apsects of the XML validation logic.  So why not just add an additional routine to provide a yes/no – ahead of all the deep and meaningful reasons as to which constraint has got the n-th degree of infringement….? So what would be the problem in offering another root level operation such as:

result=valid_doc.validate_with_xsd(xml_schema)

I know there are answers – such as ‘well extend REXML yourself and submit it!’ or ‘use another language such as..err…Java!’, so no points for phoning in with those, but I’m just perplexed that such a mainstream component as XML Schema, even at it’s most basic level, has been forceably ejected by the Ruby community thus far…

I emphasise I don’t see XML Schema as a shining light of pragmatism, but nor do I see the value in completely ignoring one of the primary currencies in a mature integration lanscape…