Debugging Hadoop Applications with IntelliJ

Hadoop DebugIn my last blog, I explained how to create and configure a Hadoop development environment so that you can build the jars and example applications from the Hadoop source code you get from the Apache Hadoop trunk repository.

This time around I’ll show you how to debug your Hadoop applications using the IntelliJ Community Edition IDE.  I’m going to discuss two different projects, one to debug the PI estimation program from the Hadoop examples jar file and the other to debug the WordCount application.

Hadoop PI Estimation Example

Open the Hadoop IntelliJ Project

I’m going to assume that you already have all IntelliJ Hadoop development tools you need.

  1. cd into your hadoop directory.
  2. Type the following command to create the Hadoop IntelliJ projects.
    mvn idea:idea
  3. Open IntelliJ.
  4. Select Open… from the top menu bar.
  5. Browse to the hadoop-main.ipr file in your hadoop directory.
  6. Open hadoop-main-ipr.

Create Run and Debug Configuration

Now that you have the Hadoop project, you are going to create a run and debugging configuration for the Hadoop MapReduce example programs. In this case we’ll be debugging the Hadoop PI estimation program.

  1. Select Run > Edit Configurations…  from the top menu bar.
  2. Click on the ‘+‘ symbol in the upper left hand corner of the Run/Debug Configurations screen.
  3. Select Application in the drop down menu.
  4. Enter standalaone as the configuration name.
  5. Enter org.apache.hadoop.util.Runjar the main class.
  6. Enter the location of your hadoop directory as the working directory. In my case it is /home/vic/apache/hadoop. To simplify the nomenclature, I’ll refer to this directory as ${HADOOP} for the remainder of the blog.
  7. The hadoop trunk version that I pulled down is hadoop-3.0.0-SNAPSHOT. The Hadoop examples jar is located at:
    ${HADOOP}/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar

    Click on the program arguments button then enter the path of the Hadoop examples jar and the PI estimation arguments as shown below.

Run-Debug Configuration

  1. Click on the Close button.
  2. Click on the OK button in the Run/Debug Configurations screen.

Debug Hadoop PI Estimation

With the Run/Debug configuration you can either run the PI estimation program straightway or step through it in the IntelliJ debugger. Let’s do some debugging first.

  1. Open the main PI estimation file QuasiMonteCarlo.java from the following location:
    ${HADOOP}/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/QuasiMonteCarlo.java
  2. Next click to the left of the code window on each line where you want to break. Each breakpoint line will have a red circle next to it as shown below.

Break points

  1. To start debugging click on the bug icon in the toolbar at the top of the IntelliJ window.
  2. You’ll see a blue bar at each point where you break and a debug window will open up at the bottom of the window. You can use the debugging controls to the right of the Console tab to step through the code.

Debugging

To run the PI estimation straight through you can click on the green triangle in the toolbar at the top of the IntelliJ window. If you run the program with the arguments entered into the standalone configuration earlier, the output will look like this:

Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
Job Finished in 1.714 seconds
Estimated value of Pi is 3.20000000000000000000

Process finished with exit code 0

WordCount Example

Create a WordCount IntelliJ Project

The process for debugging the WordCount 1.0 example from the Hadoop MapReduce Tutorial is similar to the Hadoop PI Estimation, except this time we have to create am IntelliJ project from scratch.

  1. Create this directory for your Wordcount app:
    ${HOME}/WordCount
  2. Open IntelliJ.
  3. Select New Project… from the top menu bar.
  4. Select Java Module in the New Project screen.
  5. Set the project name to WordCount.
  6. Click Next then OK.

New Project

  1. Right click on the WordCount/src folder in the Project explorer.
  2. Select New > Java Class.
  3. Enter the class name as WordCount.
  4. Click on OK.
  5. Copy the WordCount 1.0 and paste it into your WordCount.java file.
  6. Select File > Save.
  7. Select File > Project Structure…
  8. Select Modules in the Project Structure screen

New Project Structure

  1. Click on ‘+’ in the Dependencies tab
  2. Go to this directory in the Hadoop distribution:
    ${HADOOP_HOME}/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/
  3. Select the subdirectories as shown below:

Add Hadoop jars

  1. Click OK.
  2. Click on ‘+‘ in the Dependencies tab again.
  3. Select this directory in the Hadoop share directory:
    ${HADOOP_HOME}/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/lib
  4. Click OK. Your module structure should look like this:

Project Structure

  1. Still in the Project Structure screen, select Artifacts.
  2. Click on the ‘+‘ at the top of the screen.

Artifacts

  1. Select Add > Jar > Empty from the drop down menu.
  2. Set the artifact name to WordCount.
  3. Set the output directory to:
    ${HOME}/WordCount

WordCount Artifacts

  1. Click on the ‘+‘ between the Output Layout tab.
  2. Select Module Output.
  3. Select WordCount in the Choose Module screen.

Choose Module

  1. Click OK. The artifacts should look like this:

WordCount Module Output

  1. Click on OK in the Project Structure screen.
  2. Build the WordCount jar by selecting Build > Build Artifacts… 
  3. Select WordCount > Rebuild.

Create Run and Debug Configuration

Follow the steps for creating a run and debug configuration discussed previously except for the following settings:

  1. Program arguments should be:
    ${HOME}/WordCount/WordCount.jar input/ output/
  2. Working directory is:
    ${HOME}/WordCount
  3. Set the classpath to WordCount.

Create the Input Files

Finally you have to set up the input directory and files.

  1. Create this directory for the WordCount input files:

    ${HOME}/WordCount/input
  2. Create a text file in this directory called file001.
  3. Put these words in file001: Hello World Bye World.
  4. Create a text file in the same location called file002.
  5. Put these words in file002: Hello Hadoop Goodbye Hadoop.

Debug WordCount

Now you are ready to run or debug WordCount. If you run the program you get the will get a WordCount/output directory with a file called _SUCCESS and the results file named part-00000 that contains the following:

Bye	1
Goodbye	1
Hadoop	2
Hello	2
World	2

Author: 


5 Comments

Eugene Koontz on March 13, 2013 at 3:36 pm.

Very nice, thanks for this illustrative guide.

Reply

vic on March 14, 2013 at 10:49 am.

Thanks for teaching me how to debug Hadoop applications Eugene and for visiting my site.

Reply

Pavel Plichko on June 24, 2013 at 6:40 am.

Thanks a lot. This article was very helpful for me.
But I want take a note at “Program Arguments” option at WordCount application.

${HOME}/WordCount/WordCount.jar input/ output/

It doesn’t work for me, because RunJar takes second parameter as class to run. (Hadoop version 1.1.2)
So, next “Program Arguments” work fine for me:
${HOME}/WordCount/WordCount.jar WordCount input/ output/

Reply

Abhiejet on January 28, 2014 at 8:44 am.

Nice tutorial. Thank you.

Reply

Leave Your Comment

Your email will not be published or shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>