1. Copy Java and Hadoop installable on any folder
where the user has sufficient information. Here in my case I am copying it
under /usr/local/
Download link for Java - http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Download Link for Hadoop - https://archive.apache.org/dist/hadoop/core/hadoop-1.0.0/hadoop-1.0.0.tar.gz
2. Untar/Unzip
Hadoop and java using following command
$sudo tar –xzf hadoop-1.0.0.tar.gz
$sudo tar –xzf java-7-oracle.tar.gz
3. Rename
the folders to some meaning full names.
Here I renamed to the folders to java and Hadoop
4. Now
it’s time to export environment variables and add the entries in ~/.bashrc file.
.bashrc file is the script every time a user logs in. bashrc file is located
under home directory of the user. To do
so we have to do following.
You can append following entries to the
.bashrc file.
5. Now
you can close the terminal and re start it again so that the bashrc changes get
affected. You can run following commands to verify Java and Hadoop
installation.
6. Once
this done, you can configure ssh. Configuring ssh is two-step process, one is to generate keys and
second is to copy public key to the authorized_keys folder.
Here are the commands you need to run to do
so,
$ssh-keygen -t rsa -P ""
$cat $HOME/.ssh/id_rsa.pub >>
$HOME/.ssh/authorized_keys
7. Once
done, the next step is to configure Hadoop. All Hadoop configurations files are
under $HADOOP_HOME/conf folder.
Hadoop configuration requires following
three file configurations
1. hadoop-env.sh - In this file we need to set the JAVA_HOME.
This file already contains place holder for JAVA_HOME which is commented out so
you just need to search for that uncomment the code.
2. core-site.xml – This requires
configurations about the HDFS. Here we need to configure minimum two
configurations viz. hadoop.tmp.dir
and fs.default.name as shown below
<?xml
version="1.0"?>
<?xml-stylesheet
type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary
directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.
The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
3.
mapred-site.xml
- This file is specific to Job Tracker settings.
We should set this file as follows.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"
href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and
port that the MapReduce job tracker runs
at. If "local", then jobs are run
in-process as a single map
and reduce task.
</description>
</property>
</configuration>
8. Now
whatever folder we have created in step 7.3, we should manually create that
folder and give full rights to that folder.
9. Now
format NameNode so that it creates all required folder structure as shown below.
$hadoop namenode –format
10. And
the last step is to start all daemons as shown below.
$/usr/local/Hadoop/bin/start-all.sh
11. You
can verify that all daemons have started by running command
$jsp