Monday, 15 September 2014

Install Hadoop on Ubantu – Single Node cluster


1.        Copy  Java and Hadoop installable on any folder where the user has sufficient information. Here in my case I am copying it under /usr/local/





2.       Untar/Unzip  Hadoop and java using following command
$sudo tar –xzf  hadoop-1.0.0.tar.gz
$sudo tar –xzf java-7-oracle.tar.gz



3.       Rename the folders to some meaning full names.
Here I renamed to the folders to java and Hadoop



4.       Now it’s time to export environment variables and add the entries in ~/.bashrc file. .bashrc file is the script every time a user logs in. bashrc file is located under home directory of the user.  To do so we have to do following.



You can append following entries to the .bashrc file.



5.       Now you can close the terminal and re start it again so that the bashrc changes get affected. You can run following commands to verify Java and Hadoop installation.


6.       Once this done, you can configure ssh. Configuring ssh is  two-step process, one is to generate keys and second is to copy public key to the authorized_keys folder.


Here are the commands you need to run to do so,

$ssh-keygen -t rsa -P ""
$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

7.       Once done, the next step is to configure Hadoop. All Hadoop configurations files are under $HADOOP_HOME/conf folder.
Hadoop configuration requires following three file configurations

1.       hadoop-env.sh -  In this file we need to set the JAVA_HOME. This file already contains place holder for JAVA_HOME which is commented out so you just need to search for that uncomment the code.
2.       core-site.xml – This requires configurations about the HDFS. Here we need to configure minimum two configurations viz. hadoop.tmp.dir and fs.default.name as shown below

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

</configuration>

3.       mapred-site.xml  -  This file is specific to Job Tracker settings. We should set this file as follows.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

</configuration>


8.       Now whatever folder we have created in step 7.3, we should manually create that folder and give full rights to that folder.



9.       Now format NameNode so that it creates all required folder structure  as shown below.

$hadoop namenode –format



10.    And the last step is to start all daemons as shown below.
$/usr/local/Hadoop/bin/start-all.sh




11.    You can verify that all daemons have started by running command

$jsp


No comments:

Post a Comment