IntelliJ Project for Building Hadoop – The Definitive Guide Examples

intellij-hadoop-225x225I have been studying  Hadoop – The Definitive Guide by Tom White and started building the sample applications with the Makefile I discussed in my last blog. Although the Makefile approach works, I decided to try using the IntelliJ Community Edition IDE to build the examples in any given chapter all at once.

This time around I’ll walk you through a procedure to create an IntelliJ project for building Hadoop applications.

 

Install IntelliJ

If you don’t have it already, you can get the latest version of IntelliJ Community Edition here. Select the package for your operating system of choice, either Mac OS or Linux, then install IntelliJ by placing the package contents in your directory of choice.

Install the Sample Code

  1. First you need the sample code from the Hadoop book. You get it from GitHub.
  2. Place the code in a directory called  hadoop-book.
  3. NOTE:  Since writing this article the lib directory has gone missing in the code for the book. To make your life easier, click on this link hadoop-book-lib.tgz to get a copy of what I used. Unpack this code placing it into the hadoop-book directory with this command:
tar -zxvf hadoop-book-lib.tgz
  1. If you don’t have IntelliJ Community Edition you can go here to download the version for your particular operating system. As of this writing the latest version of IntelliJ is 12.0.
  2. If you don’t have Hadoop you’ll need to download one of those distributions. For the project in this article I’m using Hadoop 2.0.2-alpha.
  3. When you get the Hadoop tarball unpack it in your home directory somewhere with this command:
tar -zxvf hadoop-2.0.2-alpha-src.tar.gz

Create a Project

For this example we build the apps from Chapter 3.

  1. Start up your IntelliJ IDE.
  2. Select  Create New Project  in the Quick Start panel.
  3. Set the project type to Java Module .
  4. Set the project name to  ch03
  5. Place the project in your hadoop-book directory which in my case is located here on my Mac:
/Users/vic/src/hadoop-book
  1. Click on  More Settings .
  2. Set the project format to  .ipr (file based).
  3. Make sure the  Create source root box is checked. IntelliJ will use the code that is already located in the ch03/src directory then click  Next .

New Project

  1. We will not add any other coding frameworks so just click on Finish in the next screen.
  2. The project we just created includes the Java libraries but we need to add the Hadoop jars and the jars that come with the Hadoop book code. Select  File > Project Structure  menu item.
  3. IntelliJ adds the JDK jars but we need to add the library jars that come with the book sample code and the Hadoop jars. Select File > Project Structure .
  4. Click on  Modules .
  5. Click on the  Dependencies tab then click on the  button at the bottom of the screen.

Dependencies

  1. Select  Jars or directories…
  2. Open the  hadoop-2.0.2-alpha/share/hadoop directory, select all the subdirectories there then click  OK.

Add Hadoop jars

  1. Repeat step 15, this time select the hadoop-2.0.2-alpha/share/hadoop/common/lib then click OK.
  2. Repeat step 15, this time select the hadoop-book/lib directory then click  OK. Your  Project Structure screen should look something like this:

Project Artifacts

  1. Now we have to configure the project to build the chapter 3 jar file. Select Artifacts in the  Project Structure screen.
  2. Click on the ‘+’  symbol at the top of the screen.
  3. Select Jar then Empty in the drop down menus.
  4. Set the jar Name to ch03.
  5. Set the Output directory to put the ch03.jar file in the  hadoop-book/ch03  directory.
  6. Initially the ch03.jar has no contents. To add the ch03 class files to the output jar, click on the ‘+’  button in the Output Layout tab and select Module Output. This will add an the  ‘ch03 compile output entry under  ch03.jar.

Artifact ch03

  1. Click on OK.
  2. To build the jar select  Build > Build Artifacts…
  3. You will see a menu that offers four build actions: Build, Rebuild, Clean or Edit… the last of which takes you to the Project Structure – Artifacts screen to choose different build options. Select Rebuild to clean the hadoop-book.ch03 directory and build the ch03.jar.

Using this procedure you can create projects for any of the chapters in the Hadoop book source code which can save you a lot of time building and experimenting with the example applications.

Author:

Article by Vic Hargrave

Software developer, blogger and family man enjoying life one cup of coffee at a time. I like programming and writing articles on tech topics. And yeah, I like coffee.

12 Comments


    1. Sure enough you are right and I hadn’t noticed. I thought the lib code was originally included, but I suspect now it is pulled down with Maven. However, my attempts to build this stuff with Maven have not been successful. For you and others like us that are struggling a bit with this, I’ve uploaded a copy of my lib code to the site and updated the article to provide a link to it.

      Thanks for pointing this out and visiting my site.

  1. Hi, Have you tried it this way: Just opening the pom.xml file of the project and running Maven package on them inside the IDE? I still get some errors this way, posting to StackoverFlow pretty soon… but I think it gotta work that way too.

    1. In order to use the method you suggest for a new project you have to create a pom.xml file for the project. If you pull down code from the Apache repositories you can build the Hadoop jars with Maven since there are pom.xml files that are included with the Hadoop source – see my blog that describes this process. But building Hadoop apps is easier with IntelliJ.

      At any rate thanks for visiting my site.

  2. Hello! Quick question that’s entirely off topic. Do you know how to make your site mobile friendly? My blog looks weird when browsing from my apple iphone. I’m trying to find
    a theme or plugin that might be able to fix this problem.
    If you have any recommendations, please share.
    With thanks!

    1. If you are using WordPress then a good mobile ready theme is iFeature. You can also use one theme that is displayed for mobile devices and another that is used for PC displays and switch between the two with a plugin like ‘Any Mobile Theme Switcher’.

      I hope this helps and thanks for visiting my website.

  3. Good post! Thanks for your article. Now I can use IntelliJ to build hadoop samples now. But maybe I should try to use eclipse with hadoop-plugin to test my code.

  4. Tried to download your version of the source code from Tom White and it is giving me a 404. Can you please assist?

  5. With maven in use, the extra ‘lib’s are no longer required. All the dependencies are available in maven central and they are downloaded automagically when the code is built. The reason why folks are having issues is because the pom files are not prepared correctly. I made the fixes and got some chapter codes to work correctly. After doing some more cleanup/fixes, I will share the updates in a couple of days.

    1. OK that sounds great Syed. I’d love to take a look at that. I’m a bit of a maven luddite, but sooner or later I’ll need to figure out how to write pom.xml files.

Leave a Reply

Your email address will not be published. Required fields are marked *