Problems with the DOMParser (s4s-elt-character Error)

I was messing around with the Apache Xerces based XML DOMParser class (from the com.sun.org.apache.xerces.internal.impl.xs.dom package)for the JTwitt project and I noticed some quirky behavior. I used the following snippet of code:

DomParser parser = new DOMParser();
parser.parse(new InputSource(xmlStream));
Document d = parser.getDocument();

Pretty straightforward stuff – in fact, you probably find the same few lines in just about every single DOMParser tutorial out there. The xmlStream is an InputStream instance object with the XML data. Where do I get it from? I pull it off the Twitter as I described here. I tested this code before, and got the XML to print out in the console so my InputStream is not the issue here. Every time I called the parse method I got few dozen errors like this:

s4s-elt-character: Non-whitespace characters are not allowed in schema elements other than ‘xs:appinfo’ and ‘xs:documentation’

It was basically one error per each node which had text data as a child. I was googling this message for hours and it seems that no one has a clue what causes it. I’m definitely not the first person who got it, but I have yet to see a working solution.

In the end I decided to abandon DomParser. There is about a bazillion different ways to parse XML files in Java so I simply switched to the JAXP parser (javax.xml.parsers). Now my code looks like this:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document d = builder.parse(xmlStream);

Both snippets are essentially equivalent and achieve the same thing. So as far as I’m concerned DocumentBuilder > DOMParser. Still, if anyone has a clue what is that s4s-elt-character error all about, please leave a note in the comments so that future generations do not have to suffer because of it.

[tags]java, java xml parsing, DOMParser, DocumentBuilder, JAXP, Xerces, Apache, programming, XML[/tags]

This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to Problems with the DOMParser (s4s-elt-character Error)

Xabi says:

June 3, 2007 at 5:33 am

Hey, have you tried Nano XML. It’s small, very small. I’ve used it to parse config files written in xml and works great.

Reply | Quote
Luke says:

June 3, 2007 at 3:12 pm

I haven’t used it. Both the Xerces and JAXP parser ship with the core set of Java libraries – at least in 5.0. This way there is no need to bundle any 3rd party Jar files with my releases. I didn’t really look into stuff that was not built in – but I will check it out.

Thanks,

Reply | Quote
Dax says:

June 4, 2007 at 1:49 pm

For XML manipulation, I always use JDom. I find that it’s the best all-around XML library.

Reply | Quote
daniele says:

December 29, 2011 at 4:59 am

http://www.herongyang.com/XML-Schema/JAXP-XSD-Schema-File-Loader-Error .html

Reply | Quote