Problems with the DOMParser (s4s-elt-character Error)

I was messing around with the Apache Xerces based XML DOMParser class (from the com.sun.org.apache.xerces.internal.impl.xs.dom package)for the JTwitt project and I noticed some quirky behavior. I used the following snippet of code:

DomParser parser = new DOMParser();
parser.parse(new InputSource(xmlStream));
Document d = parser.getDocument();

Pretty straightforward stuff – in fact, you probably find the same few lines in just about every single DOMParser tutorial out there. The xmlStream is an InputStream instance object with the XML data. Where do I get it from? I pull it off the Twitter as I described here. I tested this code before, and got the XML to print out in the console so my InputStream is not the issue here. Every time I called the parse method I got few dozen errors like this:

s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than ‘xs:appinfo’ and ‘xs:documentation’

It was basically one error per each node which had text data as a child. I was googling this message for hours and it seems that no one has a clue what causes it. I’m definitely not the first person who got it, but I have yet to see a working solution.

In the end I decided to abandon DomParser. There is about a bazillion different ways to parse XML files in Java so I simply switched to the JAXP parser (javax.xml.parsers). Now my code looks like this:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document d = builder.parse(xmlStream);

Both snippets are essentially equivalent and achieve the same thing. So as far as I’m concerned DocumentBuilder > DOMParser. Still, if anyone has a clue what is that s4s-elt-character error all about, please leave a note in the comments so that future generations do not have to suffer because of it.

[tags]java, java xml parsing, DOMParser, DocumentBuilder, JAXP, Xerces, Apache, programming, XML[/tags]

This entry was posted in Uncategorized. Bookmark the permalink.



4 Responses to Problems with the DOMParser (s4s-elt-character Error)

  1. Xabi SPAIN Mozilla Firefox Windows says:

    Hey, have you tried Nano XML. It’s small, very small. I’ve used it to parse config files written in xml and works great.

    Reply  |  Quote
  2. Luke UNITED STATES Mozilla Firefox Windows says:

    I haven’t used it. Both the Xerces and JAXP parser ship with the core set of Java libraries – at least in 5.0. This way there is no need to bundle any 3rd party Jar files with my releases. I didn’t really look into stuff that was not built in – but I will check it out.

    Thanks,

    Reply  |  Quote
  3. Dax UNITED STATES Opera Windows says:

    For XML manipulation, I always use JDom. I find that it’s the best all-around XML library.

    Reply  |  Quote

Leave a Reply

Your email address will not be published. Required fields are marked *