public class SimpleXMLParser
extends java.lang.Object
The parser can:
<[CDATA[ ... ]]>
construct
\r\n
and \r
to \n
on input, in accordance with the XML Specification, Section 2.11
The code is based on http://www.javaworld.com/javaworld/javatips/javatip128/ with some extra code from XERCES to recognize the encoding.
Modifier and Type | Field and Description |
---|---|
private static int |
ATTRIBUTE_EQUAL |
private static int |
ATTRIBUTE_LVALUE |
private static int |
ATTRIBUTE_RVALUE |
private static int |
CDATA |
private static int |
CLOSE_TAG |
private static int |
COMMENT |
private static int |
DOCTYPE |
private static int |
DONE |
private static int |
ENTITY |
private static java.util.HashMap |
entityMap |
private static java.util.HashMap |
fIANA2JavaMap |
private static int |
IN_TAG |
private static int |
OPEN_TAG |
private static int |
PRE |
private static int |
QUOTE |
private static int |
SINGLE_TAG |
private static int |
START_TAG |
private static int |
TEXT |
Modifier | Constructor and Description |
---|---|
private |
SimpleXMLParser() |
Modifier and Type | Method and Description |
---|---|
static char |
decodeEntity(java.lang.String s) |
static java.lang.String |
escapeXML(java.lang.String s,
boolean onlyASCII)
Escapes a string with the appropriated XML codes.
|
private static void |
exc(java.lang.String s,
int line,
int col) |
private static java.lang.String |
getDeclaredEncoding(java.lang.String decl) |
private static java.lang.String |
getEncodingName(byte[] b4) |
static java.lang.String |
getJavaEncoding(java.lang.String iana)
Gets the java encoding from the IANA encoding.
|
static void |
parse(SimpleXMLDocHandler doc,
java.io.InputStream in)
Parses the XML document firing the events to the handler.
|
static void |
parse(SimpleXMLDocHandler doc,
java.io.Reader r) |
static void |
parse(SimpleXMLDocHandler doc,
SimpleXMLDocHandlerComment comment,
java.io.Reader r,
boolean html)
Parses the XML document firing the events to the handler.
|
private static int |
popMode(java.util.Stack st) |
private static final java.util.HashMap fIANA2JavaMap
private static final java.util.HashMap entityMap
private static final int TEXT
private static final int ENTITY
private static final int OPEN_TAG
private static final int CLOSE_TAG
private static final int START_TAG
private static final int ATTRIBUTE_LVALUE
private static final int ATTRIBUTE_EQUAL
private static final int ATTRIBUTE_RVALUE
private static final int QUOTE
private static final int IN_TAG
private static final int SINGLE_TAG
private static final int COMMENT
private static final int DONE
private static final int DOCTYPE
private static final int PRE
private static final int CDATA
private static int popMode(java.util.Stack st)
public static void parse(SimpleXMLDocHandler doc, java.io.InputStream in) throws java.io.IOException
doc
- the document handlerin
- the document. The encoding is deduced from the stream. The stream is not closedjava.io.IOException
- on errorprivate static java.lang.String getDeclaredEncoding(java.lang.String decl)
public static java.lang.String getJavaEncoding(java.lang.String iana)
iana
- the IANA encodingpublic static void parse(SimpleXMLDocHandler doc, java.io.Reader r) throws java.io.IOException
java.io.IOException
public static void parse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, java.io.Reader r, boolean html) throws java.io.IOException
doc
- the document handlerr
- the document. The encoding is already resolved. The reader is not closedjava.io.IOException
- on errorprivate static void exc(java.lang.String s, int line, int col) throws java.io.IOException
java.io.IOException
public static java.lang.String escapeXML(java.lang.String s, boolean onlyASCII)
s
- the string to be escapedonlyASCII
- codes above 127 will always be escaped with &#nn; if true
public static char decodeEntity(java.lang.String s)
private static java.lang.String getEncodingName(byte[] b4)