XML Parsing with DOM in Java

XML JavaIn my blog XML Parsing With DOM in C++, I used the Xerces-C++ XML Parser as the foundation for the XML parsing API. The classes from that article are also useful for  and can be implemented in Java. The difference is Java includes support for XML parsing with both the SAX and DOM models.

You can read up on the specifics of the DOM model in my previous article, so let’s dive right into the API code.

Input File

First, the XML file we’ll use to test the XML parsing API is the bookstore example from before:

<bookstore>
    <book category="cooking">
        <title lang="en">Everyday Italian</title>
        <author>Giada De Laurentis</author>
        <year>2005</year>
        <price>30.00</price>
    </book>
    <book category="children">
        <title lang="en">Harry Potter and the Half-Blood Prince</title>
        <author>J. K. Rowling</author>
        <year>2005</year>
        <price>29.99</price>
    </book>
</bookstore>

After parsing this file we want to be able to find the number of parent XML elements with a given tag, the attributes for the specified parent and the values of the child elements it contains.

In the bookstore XML example file, there are 2 parent elements with a tag of “book”.  Each book has a “category” attribute and 4 child elements with tags: “title”, “author”, “year” and “price”.

XML DOM Parsing API

The XML DOM parsing API consists of the same two classes as last time, namely the XmlDOMParser and XmlDOMDocument classes.

  1. XmlDOMParser - encapsulates the Java API to parse XML with the DOM model.
  2. XmlDOMDocument - uses the XmlDOMParser to parse a given document and provides methods for retrieiving XML element values from this document.

XmlDOMParser Class

Java provides the DocumentBuilderFactory class with which a DocumentBuilder object can be created to parse XML files. The XmlDOMParser encapsulates the DocumentBuilder which it creates during construction.

import org.w3c.dom.Document;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.io.FileInputStream;

public class XmlDOMParser {
    private DocumentBuilder m_db;

    public XmlDOMParser() throws Exception {
        m_db = DocumentBuilderFactory.newInstance().newDocumentBuilder();  
    } 

    public Document parse(String xmlfile) throws Exception { 
        return m_db.parse(new FileInputStream(new File(xmlfile))); 
    } 
}

The parse() method uses the DocumentBuilder to parse the specified XML file and returns a DOM Document object if successful.

XmlDOMDocument Class

The XMLDOMDocument constructor accepts an XmlDOMParser object and the name of the XML file to parse.  The XmlDOMParser class parse() method is called to get a Document object that contains the fully parsed XML document.

import org.w3c.dom.*;

public class XmlDOMDocument {
    private final Document m_doc;

    public XmlDOMDocument(XmlDOMParser parser, String xmlfile) throws Exception {
        m_doc = parser.parse(xmlfile);
    }

    public int getElementCount(String elementTag) {
        NodeList nodes = m_doc.getElementsByTagName(elementTag);
        return nodes.getLength();
    }

    public String getChildValue(String parentTag, int parentIndex, String childTag) {
        NodeList nodes = m_doc.getElementsByTagName(parentTag);
        Element element = (Element) nodes.item(parentIndex);
        NodeList list = element.getElementsByTagName(childTag);
        Element field = (Element) list.item(0);
        Node child = field.getFirstChild();
        if (child instanceof CharacterData) {
            CharacterData cd = (CharacterData) child;
            return cd.getData();
        }
        return "";
    }

    public String getAttributeValue(String elementTag, int elementIndex, 
                                    String attributeTag) {
        NodeList nodes = m_doc.getElementsByTagName(elementTag);
        Element element = (Element) nodes.item(elementIndex);
        return element.getAttribute(attributeTag);
    }
}

Get Element Count

DOM documents consists of lists of nodes, so we get the NodeList with the specified tag with a call to Document.getElementsByName(). The element count is given by the NodeList.getLength() method.

Get Child Element Value

[Lines 16-17] To get the child element values, getChildValue() again looks up the node list for the specified parent tag with a call to Document.getElementsByName(). Next the parent element at the given index is retrieved by calling NodeList.item() and cast to an Element type.

[Lines 18-19] Since the child element we are looking for is yet another NodeList, this time of length 1, the calls to Document.getElementsByName() and NodeList.item() are repeated the index of the child item is 0.

[Lines 20-25] Lastly the child values is obtained from the first child node. The node’s character data is returned or  if there is none just null.

Get Element Attribute Value

The getAttributeValue() method gets the same node list by element tag as in the previous two methods then calls Element.getAttribute() to return the attribute value for the given attribute tag.

Test Application

Code

The ParseTest class parses the bookstore.xml file then prints out the attribute and child values for each book.

public class ParseTest {
  public static void main(String[] args) {
    ParseTest test = new ParseTest();
    try {
      XmlDOMDocument doc = new XmlDOMDocument(new XmlDOMParser(), "./bookstore.xml");
      int count = doc.getElementCount("book");
      for (int i = 0; i < count; i++) {
        System.out.println("Book "+Integer.toString(+1));
        System.out.println("book category - "+doc.getAttributeValue("book", i, "category"));
        System.out.println("book title    - "+doc.getChildValue("book", i, "title"));
        System.out.println("book author   - "+doc.getChildValue("book", i, "author"));
        System.out.println("book year     - "+doc.getChildValue("book", i, "year"));
        System.out.println("book price    - "+doc.getChildValue("book", i, "price"));
      }
    }
    catch (Exception ex) {
        ex.printStackTrace();
    }
  }
}

Build and Run

You can get the code for the project at Github - https://github.com/vichargrave/xmldom-java.git. You’ll need IntelliJ Ultimate or Community Edition to build the project. After you get it follow these instructions to build and run the test application:

  1. Double click on the xmldom-java.ipr file to load the project.
  2. Select Build > Make Project from the top menu bar.
  3. Select Run > ‘Run ParseTest’ from the top menu bar.

The output from ParseTest will look like this:

Book 1
book category - cooking
book title - Everyday Italian
book author - Giada De Laurentis
book year - 2005
book price - 30.00
Book 1
book category - children
book title - Harry Potter and the Half-Blood Prince
book author - J. K. Rowling
book year - 2005
book price - 29.99

Author:


Leave Your Comment

Your email will not be published or shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


Refresh