org.jdesktop.dom
Class SimpleHtmlDocumentBuilder

java.lang.Object
  extended by javax.xml.parsers.DocumentBuilder
      extended by org.jdesktop.dom.SimpleHtmlDocumentBuilder

public class SimpleHtmlDocumentBuilder
extends DocumentBuilder

An HTML DOM DocumentBuilder implementation that does not require the factory pattern for creation. Most of the time calling one of the static simpleParse methods is all that is required.

This implementation requires a normal DOM parser. It is not suitable for parsing arbitrary HTML documents, even those documents which conform to the various HTML specifications. Rather, it requires a preproccesor to first clean up the HTML such that it can be parsed into a DOM.


Constructor Summary
SimpleHtmlDocumentBuilder()
          Create a new SimpleHtmlDocumentBuilder.
 
Method Summary
 DOMImplementation getDOMImplementation()
          Obtain an instance of a DOMImplementation object.
 Schema getSchema()
          Get a reference to the the Schema being used by the XML processor.
 boolean isNamespaceAware()
          Indicates whether or not this parser is configured to understand namespaces.
 boolean isValidating()
          Indicates whether or not this parser is configured to validate XML documents.
 boolean isXIncludeAware()
          Get the XInclude processing mode for this parser.
 SimpleHtmlDocument newDocument()
          Obtain a new instance of a DOM Document object to build a DOM tree with.
 SimpleHtmlDocument parse(File f)
          Parse the content of the given file as an XML document and return a new DOM Document object.
 SimpleHtmlDocument parse(InputSource is)
          Parse the content of the given input source as an XML document and return a new DOM Document object.
 SimpleHtmlDocument parse(InputStream is)
          Parse the content of the given InputStream as an XML document and return a new DOM Document object.
 SimpleHtmlDocument parse(InputStream is, String systemId)
          Parse the content of the given InputStream as an XML document and return a new DOM Document object.
 SimpleHtmlDocument parse(String uri)
          Parse the content of the given URI as an XML document and return a new DOM Document object.
 SimpleHtmlDocument parseString(String html)
          Parse the content of the given String as an XML document and return a new HTML DOM SimpleHtmlDocument object.
 void reset()
          Reset this DocumentBuilder to its original configuration.
 void setEntityResolver(EntityResolver er)
          Specify the EntityResolver to be used to resolve entities present in the XML document to be parsed.
 void setErrorHandler(ErrorHandler eh)
          Specify the ErrorHandler to be used by the parser.
static SimpleHtmlDocument simpleParse(InputSource is)
          Parse the content of the given input source as an XML document and return a new HTML DOM SimpleDocument object.
static SimpleHtmlDocument simpleParse(InputStream in)
          Parse the content of the given InputStream as an XML document and return a new HTML DOM SimpleHtmlDocument object.
static SimpleHtmlDocument simpleParse(String xml)
          Parse the content of the given String as an XML document and return a new HTML DOM SimpleHtmlDocument object.
static SimpleHtmlDocument simpleParse(URL url)
          Parse the content of the given URL as an XML document and return a new HTML DOM SimpleHtmlDocument object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleHtmlDocumentBuilder

public SimpleHtmlDocumentBuilder()
Create a new SimpleHtmlDocumentBuilder. SimpleHtmlDocumentBuilder will delegate parsing to the default DocumentBuilder constructed via the default DocumentBuilderFactory.

Method Detail

parseString

public SimpleHtmlDocument parseString(String html)
                               throws SAXException,
                                      IOException

Parse the content of the given String as an XML document and return a new HTML DOM SimpleHtmlDocument object. An IllegalArgumentException is thrown if the String is null.

NOTE: this implementation requires a normal DOM parser. It is not suitable for parsing arbitrary HTML documents, even those documents which conform to the various HTML specifications. Rather, it requires a preproccesor to first clean up the HTML such that it can be parsed into a DOM.

Parameters:
html - String containing the content to be parsed. Must be valid XHTML
Returns:
SimpleHtmlDocument result of parsing the String
Throws:
IOException - If any IO errors occur.
SAXException - If any parse errors occur.
IllegalArgumentException - When html is null
See Also:
DocumentHandler

parse

public SimpleHtmlDocument parse(InputSource is)
                         throws SAXException,
                                IOException
Description copied from class: javax.xml.parsers.DocumentBuilder
Parse the content of the given input source as an XML document and return a new DOM Document object. An IllegalArgumentException is thrown if the InputSource is null null.

Specified by:
parse in class DocumentBuilder
Parameters:
is - InputSource containing the content to be parsed.
Returns:
A new DOM Document object.
Throws:
SAXException - If any parse errors occur.
IOException - If any IO errors occur.
See Also:
DocumentHandler

parse

public SimpleHtmlDocument parse(InputStream is)
                         throws SAXException,
                                IOException
Description copied from class: javax.xml.parsers.DocumentBuilder
Parse the content of the given InputStream as an XML document and return a new DOM Document object. An IllegalArgumentException is thrown if the InputStream is null.

Overrides:
parse in class DocumentBuilder
Parameters:
is - InputStream containing the content to be parsed.
Returns:
Document result of parsing the InputStream
Throws:
SAXException - If any parse errors occur.
IOException - If any IO errors occur.
See Also:
DocumentHandler

parse

public SimpleHtmlDocument parse(InputStream is,
                                String systemId)
                         throws SAXException,
                                IOException
Description copied from class: javax.xml.parsers.DocumentBuilder
Parse the content of the given InputStream as an XML document and return a new DOM Document object. An IllegalArgumentException is thrown if the InputStream is null.

Overrides:
parse in class DocumentBuilder
Parameters:
is - InputStream containing the content to be parsed.
systemId - Provide a base for resolving relative URIs.
Returns:
A new DOM Document object.
Throws:
SAXException - If any parse errors occur.
IOException - If any IO errors occur.
See Also:
DocumentHandler

parse

public SimpleHtmlDocument parse(String uri)
                         throws SAXException,
                                IOException
Description copied from class: javax.xml.parsers.DocumentBuilder
Parse the content of the given URI as an XML document and return a new DOM Document object. An IllegalArgumentException is thrown if the URI is null null.

Overrides:
parse in class DocumentBuilder
Parameters:
uri - The location of the content to be parsed.
Returns:
A new DOM Document object.
Throws:
SAXException - If any parse errors occur.
IOException - If any IO errors occur.
See Also:
DocumentHandler

parse

public SimpleHtmlDocument parse(File f)
                         throws SAXException,
                                IOException
Description copied from class: javax.xml.parsers.DocumentBuilder
Parse the content of the given file as an XML document and return a new DOM Document object. An IllegalArgumentException is thrown if the File is null null.

Overrides:
parse in class DocumentBuilder
Parameters:
f - The file containing the XML to parse.
Returns:
A new DOM Document object.
Throws:
SAXException - If any parse errors occur.
IOException - If any IO errors occur.
See Also:
DocumentHandler

isNamespaceAware

public boolean isNamespaceAware()
Description copied from class: javax.xml.parsers.DocumentBuilder
Indicates whether or not this parser is configured to understand namespaces.

Specified by:
isNamespaceAware in class DocumentBuilder
Returns:
true if this parser is configured to understand namespaces; false otherwise.

isValidating

public boolean isValidating()
Description copied from class: javax.xml.parsers.DocumentBuilder
Indicates whether or not this parser is configured to validate XML documents.

Specified by:
isValidating in class DocumentBuilder
Returns:
true if this parser is configured to validate XML documents; false otherwise.

setEntityResolver

public void setEntityResolver(EntityResolver er)
Description copied from class: javax.xml.parsers.DocumentBuilder
Specify the EntityResolver to be used to resolve entities present in the XML document to be parsed. Setting this to null will result in the underlying implementation using it's own default implementation and behavior.

Specified by:
setEntityResolver in class DocumentBuilder
Parameters:
er - The EntityResolver to be used to resolve entities present in the XML document to be parsed.

setErrorHandler

public void setErrorHandler(ErrorHandler eh)
Description copied from class: javax.xml.parsers.DocumentBuilder
Specify the ErrorHandler to be used by the parser. Setting this to null will result in the underlying implementation using it's own default implementation and behavior.

Specified by:
setErrorHandler in class DocumentBuilder
Parameters:
eh - The ErrorHandler to be used by the parser.

newDocument

public SimpleHtmlDocument newDocument()
Description copied from class: javax.xml.parsers.DocumentBuilder
Obtain a new instance of a DOM Document object to build a DOM tree with.

Specified by:
newDocument in class DocumentBuilder
Returns:
A new instance of a DOM Document object.

getDOMImplementation

public DOMImplementation getDOMImplementation()
Description copied from class: javax.xml.parsers.DocumentBuilder
Obtain an instance of a DOMImplementation object.

Specified by:
getDOMImplementation in class DocumentBuilder
Returns:
A new instance of a DOMImplementation.

reset

public void reset()
Description copied from class: javax.xml.parsers.DocumentBuilder

Reset this DocumentBuilder to its original configuration.

DocumentBuilder is reset to the same state as when it was created with DocumentBuilderFactory.newDocumentBuilder(). reset() is designed to allow the reuse of existing DocumentBuilders thus saving resources associated with the creation of new DocumentBuilders.

The reset DocumentBuilder is not guaranteed to have the same EntityResolver or ErrorHandler Objects, e.g. Object.equals(Object obj). It is guaranteed to have a functionally equal EntityResolver and ErrorHandler.

Overrides:
reset in class DocumentBuilder

getSchema

public Schema getSchema()
Description copied from class: javax.xml.parsers.DocumentBuilder

Get a reference to the the Schema being used by the XML processor.

If no schema is being used, null is returned.

Overrides:
getSchema in class DocumentBuilder
Returns:
Schema being used or null if none in use

isXIncludeAware

public boolean isXIncludeAware()
Description copied from class: javax.xml.parsers.DocumentBuilder

Get the XInclude processing mode for this parser.

Overrides:
isXIncludeAware in class DocumentBuilder
Returns:
the return value of the DocumentBuilderFactory.isXIncludeAware() when this parser was created from factory.
See Also:
DocumentBuilderFactory.setXIncludeAware(boolean)

simpleParse

public static SimpleHtmlDocument simpleParse(InputSource is)
                                      throws SAXException,
                                             IOException

Parse the content of the given input source as an XML document and return a new HTML DOM SimpleDocument object. An IllegalArgumentException is thrown if the InputSource is null null.

NOTE: this implementation requires a normal DOM parser. It is not suitable for parsing arbitrary HTML documents, even those documents which conform to the various HTML specifications. Rather, it requires a preproccesor to first clean up the HTML such that it can be parsed into a DOM.

Parameters:
is - InputSource containing the content to be parsed.
Returns:
A new DOM SimpleHtmlDocument object.
Throws:
IOException - If any IO errors occur.
SAXException - If any parse errors occur.
IllegalArgumentException - When is is null
See Also:
DocumentHandler

simpleParse

public static SimpleHtmlDocument simpleParse(InputStream in)
                                      throws SAXException,
                                             IOException

Parse the content of the given InputStream as an XML document and return a new HTML DOM SimpleHtmlDocument object. An IllegalArgumentException is thrown if the InputStream is null.

NOTE: this implementation requires a normal DOM parser. It is not suitable for parsing arbitrary HTML documents, even those documents which conform to the various HTML specifications. Rather, it requires a preproccesor to first clean up the HTML such that it can be parsed into a DOM.

Parameters:
is - InputStream containing the content to be parsed.
Returns:
HtmlSimpleDocument result of parsing the InputStream
Throws:
IOException - If any IO errors occur.
SAXException - If any parse errors occur.
IllegalArgumentException - When is is null
See Also:
DocumentHandler

simpleParse

public static SimpleHtmlDocument simpleParse(URL url)
                                      throws SAXException,
                                             IOException

Parse the content of the given URL as an XML document and return a new HTML DOM SimpleHtmlDocument object. An IllegalArgumentException is thrown if the URI is null null.

NOTE: this implementation requires a normal DOM parser. It is not suitable for parsing arbitrary HTML documents, even those documents which conform to the various HTML specifications. Rather, it requires a preproccesor to first clean up the HTML such that it can be parsed into a DOM.

Parameters:
uri - The location of the content to be parsed.
Returns:
A new DOM SimpleHtmlDocument object.
Throws:
IOException - If any IO errors occur.
SAXException - If any parse errors occur.
IllegalArgumentException - When url is null
See Also:
DocumentHandler

simpleParse

public static SimpleHtmlDocument simpleParse(String xml)
                                      throws SAXException,
                                             IOException

Parse the content of the given String as an XML document and return a new HTML DOM SimpleHtmlDocument object. An IllegalArgumentException is thrown if the String is null.

NOTE: this implementation requires a normal DOM parser. It is not suitable for parsing arbitrary HTML documents, even those documents which conform to the various HTML specifications. Rather, it requires a preproccesor to first clean up the HTML such that it can be parsed into a DOM.

Parameters:
xml - String containing the content to be parsed.
Returns:
SimpleDocument result of parsing the String
Throws:
IOException - If any IO errors occur.
SAXException - If any parse errors occur.
IllegalArgumentException - When xml is null
See Also:
DocumentHandler