XML Parsing with DOM in C++

XML-C++Having the ability to parse XML files is a requirement for a lot of applications these days. XML is a standard format for exchanging data between programs and storing configuration data.

If you want to parse XML documents in C++ you can benefit from using an external library like the Xerces-C++ XML Parser. Xerces provides an elaborate, but somewhat complex API for navigating XML files. To simplify matters, I’ll describe a C++ class that encapsulate the Xerces calls to index and retrieve XML element values and attributes.


XML Parsing Models

XML Elements

XML documents consistent of elements that are denoted by beginning and ending tags. XML elements are of the general form:

where value consists of either a string value or additional XML elements. An attribute is a value associated with the given element.

<element attribute>

Here is an example of an XML document that is intended to represent two books contained in a bookstore. The bookstore element contains two book elements each with a category attribute. Each book element contains fields to that describe the book.

    <book category="cooking">
        <title lang="en">Everyday Italian</title>
        <author>Giada De Laurentis</author>
    <book category="children">
        <title lang="en">Harry Potter and the Half-Blood Prince</title>
        <author>J. K. Rowling</author>

The bookstore in this case is analogous to a database table with two book rows and the title, author, year and price fields are the colums of the rows.

SAX Model

XML files can be parsed using two different XML models, SAX and DOM (Document Object Model). Parsing with SAX utilizes mechanisms where the XML document is traversed and as XML elements are visited the contents are passed back to the calling application.  When the beginning and ending elements of a section, e.g. the book sections in the example XML, are encountered, the caller is notified so it can keep track of each section and so it knows that other elements will follow.

Since SAX parsing visits each element one at time, it is fast and does not make heavy demands on memory. It is also possible to process XML documents of arbitrary sizes. However, SAX requires the calling application to do all the heavy lifting when it comes to storing the XML field values.

DOM Model

With DOM parsing the entire XML document is read into memory and organized in the form of a tree as shown in the following diagram.


Example XML Document Diagram
Source: XML DOM Node Tree by w3schools.com

The root element is the bookstore and child elements are book. The bookstore is the parent element of the book elements. Each book element is the parent of four child elements, title, author, year and price.

When using DOM it is possible to index through each parent and child element, so the calling application does not have to maintain the document structure as it does with SAX.

The downside of using DOM is that the size of document you can parse with it is limited by the amount of memory an application has to work with and parsing is less efficient.

Xerces Installation

Before diving into the XML DOM parsing API, let’s go over how to install Xerces. You can get the Xerces library in binary form for various platforms, but I was built my example on MacOS so I elected to build from source.

  1. Download Xerces 3.1.1 from the download site.
  2. Place the tarball in your home directory or wherever.
  3. tar zxvf xerces-c-3.1.1.tar.gz
  4. cd xerces-c-3.1.1/
  5. ./configure
  6. make
  7. sudo su
  8. make install

This will place the Xerces headers and library in /usr/local on your system.

Xerces Platform Initialization

Before we do any parsing the Xerces the platform must first be initialized, which involves the following 3 steps:

  1. Call XMLPlatformUtils::Initialize()
  2. Create an XmlDOMParser object.
  3. Create an error handler for the parser.

For convenience we’ll do these three steps in a single function call.

XercesDOMParser*   parser = NULL;
ErrorHandler*      errorHandler = NULL;

void createParser()
    if (!parser)
        parser = new XercesDOMParser();
        errorHandler = (ErrorHandler*) new XmlDomErrorHandler();

We only need one parser so createParser() does the platform intialization and parser creation just once. The error handler class is derived from the Xerces HandlerBase class as follows:

class XmlDomErrorHandler : public HandlerBase
    void fatalError(const SAXParseException &exc) {
        printf("Fatal parsing error at line %d\n", (int)exc.getLineNumber());

When an exception in thrown within the Xerces platform it will be caught here and an error message will be displayed indicating the line number of the offending code.

XmlDOMDocument Class

The XmlDomDocument class encapsulates the Xerces DOM API. The class interface and definition are contained in the XmlDomDocument.h and XmlDomDocument.cpp files respectively.  Note that the createParser() code in the previous section is also defined in the XmlDomDocument.cpp file.

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/dom/DOM.hpp>
#include <xercesc/sax/HandlerBase.hpp>
#include <xercesc/util/XMLString.hpp>
#include <xercesc/util/PlatformUtils.hpp>
#include <string>

using namespace std;
using namespace xercesc;

class XmlDomDocument
    DOMDocument* m_doc;

    XmlDomDocument(const char* xmlfile);

    string getChildValue(const char* parentTag, int parentIndex, 
                         const char* childTag, int childIndex);
    string getChildAttribute(const char* parentTag,   
                             int parentIndex, int childIndex, 
                             const char* childTag 
                             const char* attributeTag);
    int getChildCount(const char* parentTag, int parentIndex, 
                      const char* childTag);

    XmlDomDocument(const XmlDOMDocument&); 


The constructor calls createParser(), which is defined in the XmlDomDocument.cpp file and visable outside this file, to initialize the Xerces platform then XercesDOMParser::parse() to parse the given XML and produce a DOMDocument object the pointer which is stored in the m_doc member variable. The XmlDomDocument default and copy constructors are declared private since we only want this object created one way, with a constructor that accepts an XmlDOMDocument pointer and the name of the XML file to be parsed.

XmlDomDocument::XmlDomDocument(const char* xmlfile) : m_doc(NULL)
    m_doc = parser->parse(xmlfile);


Since the DOMDocument is “adopted” by the XmlDOMDocument, we must release the memory consumed by the document when XmlDOMDocument is destroyed.

    if (m_doc) m_doc->release();

Get Child Element Value

The XmlDomDocument::getChildValue() takes the name of a parent tag and the index of the parent tag in the XML file. For example, if I want to get the price of the Harry Potter book from the example XML file, the parent tag is “book”, the parent index would be “1” – like with C/C++ indexing starts from 0 – and the child tag is “price”.

string XmlDomDocument::getChildValue(const char* parentTag, 
                                     int parentIndex, 
                                     const char* childTag,
                                     intt childIndex)
    XMLCh* temp = XMLString::transcode(parentTag);
    DOMNodeList* list = m_doc->getElementsByTagName(temp);

    DOMElement* parent = 
    DOMElement* child = 
    string value;
    if (child) {
        char* temp2 = XMLString::transcode(child->getTextContent());
        value = temp2;
    else {
        value = "";
    return value;

[Lines 8-8] Instead of strings Xerces uses its own XMLString objects, so whenever we want to exchange strings with the platform we must convert from C++ strings to XMLStrings with a call to XMLString::transcode() which returns an XMLCh pointer when passed a pointer to a character string. The XMLCh pointer is then used in the call to DOMDocument::getElementByTagName() which returns a pointer to a DOMNodeList object. After we are done with the XMLString object we must release its memory back to the heap with a call to XMLString::release(). This a very common Xerces string usage pattern.

[Lines 10-14] In the Xerces DOM model an XML file is a collection of DOMNodeList objects each with a single root element that has 0 or more parent elements, retrievable by index, and each parent has 0 or more children, retrievable by child name and index. Getting back to our Harry Potter book example, the root element is “bookstore”, we want the second “book” parent referenced by index “1” and we want the first – at index 0 – child referenced by name “price”. DOMNodeList::item() returns a pointer to a the parent list object at the given index, which is cast to a DOMElement pointer. Similarly a pointer to the child element object for this parent is returned with a call to DOMElement::getElementsByTagName() the pointer to which is cast to a DOMElement pointer. The child element returned will be the one at the specified child index.

[Lines 16-25] If we get a non-NULL child element, its value can be obtained from a call to DOMElement::getTextContent() which returns a ponter to an XMLString then copied to a string object and returned to the caller. Otherwise the string with a NULL value is returned.

Get Child Attribute Value

Retrieving XML child attribute values is very similar to retrieving child element values, except that the attribute tag is also passed to the method. For example if we wanted the book category for the Harry Potter book, the parent tag is “bookstore”, parent index is 0, the child tag is “book”, the child index is “1” and the attribute tag is “category”.

string XmlDomDocument::getChildAttribute(const char* parentTag, 
         int parentIndex, const char* childTag, int childIndex,
         const char* attributeTag)
    XMLCh* temp = XMLString::transcode(parentTag);
    DOMNodeList* list = m_doc->getElementsByTagName(temp);

    DOMElement* parent = 
    DOMElement* child = 

    string value;
    if (child) {
        temp = XMLString::transcode(attributeTag);
        char* temp2 = XMLString::transcode(child->getAttribute(temp));
        value = temp2;
 else {
     value = "";
 return value;

[Lines 5-12] Retrieve the child element, if any, as before.

[Lines 14-20] This time get the specified child attribute value instead of the child value of the element.

[Lines 21-25] Return the attribute value or NULL string if the specified child or child attribute is not found.

Get Child Count

To get the number of elements contained under a given parent, we call DOMDocumentElement::getElementsByName() with the parent name, which returns a list of parent elements. We get parent element at parentIndex then call DOMDocumentElement::getElementsByName(), this time with the childTag. As before this gives us a pointer to a DOMNodeList from which we can get the child count directly with a call to DOMNodeList::getLength().

int XmlDomDocument::getChildCount(const char* parentTag, int parentIndex, 
                                  const char* childTag)
    XMLCh* temp = XMLString::transcode(parentTag);
    DOMNodeList* list = m_doc->getElementsByTagName(temp);

    DOMElement* parent = dynamic_cast<DOMElement*>(list->item(parentIndex));
    DOMNodeList* childList = parent->getElementsByTagName(XMLString::transcode(childTag));
    return (int)childList->getLength();

Test Application


The test application is defined in the main.cpp file. It uses the XML file then gets all the books their attribute and child values then prints the values to stdout.

#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <iostream>
#include "XmlDomDocument.h"

int main(int argc, char** argv)
    string value;
    XmlDomDocument* doc = new XmlDomDocument("./bookstore.xml");
    if (doc) {
        for (int i = 0; i < doc->getChildCount("bookstore", 0, "book"); i++) {
            printf("Book %d\n", i+1);
            value = doc->getChildAttribute("bookstore", 0, "book", i, "category");
            printf("book category   - %s\n", value.c_str());
            value = doc->getChildValue("book", i, "title");
            printf("book title      - %s\n", value.c_str());
            value = doc->getChildAttribute("book", i, "title", 0, "lang");
            printf("book title lang - %s\n", value.c_str);
            value = doc->getChildValue("book", i, "author");
            printf("book author     - %s\n", value.c_str());
            value = doc->getChildValue("book", i, "year");
            printf("book year       - %s\n", value.c_str());
            value = doc->getChildValue("book", i, "price");
            printf("book price      - %s\n", value.c_str());
        delete doc;

Build and Run

You can get the source code for the project from Github – https://github.com/vichargrave/xmldom.git. To build it just cd into the project directory and type make.

After building the test app run it as follows:

$ ./xmldom 
Book 1
book category   - cooking
book title      - Everyday Italian
book title lang - en
book author     - Giada De Laurentis
book year       - 2005
book price      - 30.00
Book 2
book category   - children
book title      - Harry Potter and the Half-Blood Prince
book title lang - en
book author     - J. K. Rowling
book year       - 2005
book price      - 29.99


Comments on this post

  1. Great roundup on xml parsing, i was wondering whether you have used an xml editor at all for parsing or what you think of them as a parsing tool?

    • vic

      They are fine for visualizing an entire file, but I don’t know of any that you can embed in your programs, like sed, awk or xerces, to parse files.

  2. Michael Knafo

    Thanks, it helped me a lot, your example is simple and clear.

  3. Boris Nasir

    Thank you for your information this is a very good example for beginners but when I try to execute your example I get

    make: *** No targets specified and no makefile found. Stop.

    What is the problem ?

    • Boris Nasir

      Sorry, I named the Makefile as MakeFile, this correction solved the problem.

      However this time I get a lot of undefined reference error and exit with
      make: *** [xmldom] Error 1

      full form of error :

      g++ -lxerces-c main.o xmldom.o -o xmldom
      main.o: In function `xercesc_3_1::XMLAttDefList::~XMLAttDefList()’:
      main.cpp:(.text._ZN11xercesc_3_113XMLAttDefListD2Ev[_ZN11xercesc_3_113XMLAttDefListD5Ev]+0x37): undefined reference to `xercesc_3_1::XMemory::operator delete(void*)’
      main.o: In function `xercesc_3_1::XMLAttDefList::~XMLAttDefList()’:
      main.cpp:(.text._ZN11xercesc_3_113XMLAttDefListD0Ev[_ZN11xercesc_3_113XMLAttDefListD5Ev]+0x20): undefined reference to `xercesc_3_1::XMemory::operator delete(void*)’
      main.o: In function `xercesc_3_1::DTDEntityDecl::~DTDEntityDecl()’:

      collect2: ld returned 1 exit status
      make: *** [xmldom] Error 1

      • vic

        It appears you don’t have xerces installed. Download and insta it then you should get better results.

        • Boris Nasir

          Thank you for your answer, but the problem was your makefile :) I tried it with the makefile below and it worked.

          • vic

            OK glad you got it to work. However the Makefile I provided works fine on Linux and Mac OS. There must have been a problem with your installation.

  4. Chris

    Thanks a lot for this very good tutorial. Unfortunatelly, the makefile did not work for me. Instead I used the following makefile:

    CPPFLAGS=-g -ggdb3
    LDFLAGS=-g -ggdb3
    LDLIBS= -lxerces-c

    SRCS=$(wildcard *.cpp)
    OBJS=$(subst .cpp,.o,$(SRCS))

    all: $(OUTFILE)

    $(OUTFILE): $(OBJS)
    g++ $(LDFLAGS) -o $(OUTFILE) $(OBJS) $(LDLIBS)

    depend: .depend

    .depend: $(SRCS)
    rm -f ./.depend
    $(CXX) $(CPPFLAGS) -MM $^>>./.depend;

    $(RM) $(OBJS)

    dist-clean: clean
    $(RM) *~ .dependtool

    include .depend

    (taken from http://stackoverflow.com/questions/2481269/how-to-make-simple-c-makefile )

    PS.: The Captcha is almost impossible to get right and the comment is gone afterwards…

  5. parveen

    Really good work…thanks a lot.

  6. Elvis

    I have problem with parsing a string instead of an xml file. I tried with MemBufInputSource but still XercesDOMParser can’t parse it. here is my source code:

    XercesDOMParser* parser = new XercesDOMParser();
    ErrorHandler* errHandler = (ErrorHandler*) new HandlerBase();

    MemBufInputSource source((const XMLByte*) clearInput.c_str(), clearInput.length(),”dummy”);
    parser->parse( source );

    • vic

      I’m not familiar with the MemBufInputSource class, but there is an example of how to use it in the xerces-c-3.1.1/samples/src/MemParse/MemParse.cpp file.

  7. Soumya Prasad Ukil

    Does it support xpath-based query?

    • vic

      I’m not sure. Sorry.

  8. kp

    vichargrave ur code is working fine..but when i try to build in vc++ 2010..i am getting a error like “fatal error parsing line 0″…fatal error comes only when the xml file is corrupted but when i tried to parse with some example i am getting the same error..can u pls suggest?..thanks in advance

  9. milton ortiz

    great article, the better and clearer i’ve seen regarded to xerces, do you plan making a tutorial about parsing with sax? i need to read a xml of maybe 250 articles similar to your bookstore and the memory available is pretty limited, is sax the more convenient way to do this? how can i determine the memory amount, jus a vage idea is helpfull.
    congrats for your nice tutorial

  10. milton ortiz

    sorry if duplicated, i just don’t see my previous intend…
    i liked a lot you information, pretty usefull, is there any chance you could make a sax parsing tutorial? i am trying to read a xml, it is similar structure as your bookstore example, maybe 250 items and i wonder if sax is the recommended approach since i am pretty short on memory.
    there’s any way to have an idea on the memory to be used?
    thanks a lot in advance

    • vic

      Sorry I have to approve comments before they appear. I get some weird stuff sometimes that I have to filter.

      SAX parsing will work if your application is memory constrained. Note, however, it does not let you search for fields the way a DOM parsing scheme does. Here is a pretty good tutorial on SAX parsing: http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/.

      • milton ortiz

        thanks a lot for your answer, i’ve seen that tutorial but i need this to be done in c++ because a library is written in c++, i’ll try dom method to see if it works in my project.
        another question and is the last one, how can i search for the parent by name and not by index in your example? let’s say i have 250 items and i want to find “alice in wonderland” and retrieve all it’s child values but i don’t now the index of that book?
        really appreciate your help

  11. Eugin

    Helpful article. Please, what do you know about standart methods in C++ for DOM in last Microsoft libraries?

  12. Grant

    Have you ever seen this? The default error handler is called if an XML size is over 700k bytes, on Solaris 10:

    =>[1] __lwp_kill(0x0, 0x6, 0x0, 0x6, 0xffbffeff, 0x0), at 0xff24ebd4
    [2] raise(0x6, 0x0, 0xff2c7080, 0xff22e0f0, 0xffffffff, 0x6), at 0xff1e7bb0
    [3] abort(0x21133238, 0x1, 0xff0f54b4, 0xffb04, 0xff2c5518, 0x0), at 0xff1c29f0
    [4] __Cimpl::default_terminate(0x21133238, 0xff2c7940, 0x1c00, 0x1793c, 0x0, 0xff0f5010), at 0xff0f5014
    [5] __Cimpl::ex_terminate(0xff10d618, 0x0, 0x0, 0xff10d618, 0xff10cd10, 0x1), at 0xff0f4e24
    —- hidden frames, use ‘where -h’ to see them all —-
    [8] xercesc_3_1::AbstractDOMParser::parse(0x18dc010, 0x9e27b950, 0xb4db9000, 0xb4621a4c, 0xb4642c00, 0x0), at 0xb445a878

    Any idea how to handle this.

    Appreciate your help!

    • vic

      I have not seen that, but then I have not been working with XML documents that big. It may be that you are running into the limits of what DOM can handle. You may want to consider using SAX parsing which does not load the entire parsed document into memory.

  13. Robert Kennedy

    Good afternoon,

    First off, thanks for the tutorial and example. That’s helped a lot in getting started using the xerces stuff to process my XML. I got your example working and then started modifying it to work with my file instead of the bookstore one. That’s been going well so far, but I’ve run into a situation that I’m not sure how to handle. The file that I need to process has a structure in it similar to the following.



    There can be 0..N FooInstance entries within FooInstances, and the PrivateConfigs lists in each section that has one can have 0..20 instances of that. I can’t change the structure of the XML file, that’s controlled by someone else. So, how would I go about specifying on the call to getAttributeValue that I wanted an entry in the OutputAdapter section? I can get the counts PrivateConfig in the input and output adapters, but how do I get the actual data from those list elements? Is this going to require adding more methods to the XMLDomDocument class, or is there a way to do it with the existing methods that I just am not understanding?


    • Hi Robert.

      Per our private email, I have added the getChildAttribute() method and modified the article, and all subsequent XML articles, accordingly.

      Many thanks for your discussion and input.

  14. Prakash

    Hi Vic, Very good tutorial. Actually I took the binaries from http://archive.apache.org/dist/xerces/c/3/binaries/ and I suppose I don’t have to run ./configure. I straightway skipped steps 5 to 8 ./configure, make, sudo su and make install. And created a project in eclipse with Cygwin tool-chain and included all directories under xerces-c-3.1.1-x86_64-linux-gcc-3.4\include\xercesc in include directories area under eclipse C/C++ General/Paths and Symbols/include(tab). But still it shows ‘Unresolved inclusion: ‘. Am I missing something? I didn’t find better place than this to get help. This is my first time I m entering into c++ world.

    • I’ve tried this project with Cygwin and frankly I wouldn’t advise it. I used Linux and Mac OS for this project. I advise you using one of those platforms. Also follow the recipe as I’ve written it in the blog for best results.

      • Prakash

        Is there a solution for people who are working on windows? I was almost done as I can see all the required .hpp files under xerces-c-3.1.1-x86_64-linux-gcc-3.4\include\xercesc. My only problem is into Eclipse settings which I m trying for the first time.

        • I’ve used Xerces DLLs with Visual Studio C++, which works pretty well. That is what I would suggest you do.

Leave a Reply

Your email address will not be published. Required fields are marked *

Trackbacks and Pinbacks on this post

  1. […] time I discussed XML Parsing with DOM I used in C++ and the Xerces-C++ XML Parser as the foundation. The classes from that article are […]

TrackBack URL