Saturday, June 25, 2016

Hadoop installation(Distributed Mode in Ubuntu)

Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

You can install Hadoop in 3 modes such as, 

 c) Distributed cluster mode

This article gives you idea about Distributed mode installation (in UBUNTU OS) only. 

1. To install Hadoop you need the following software.( can be downloaded free)

  • hadoop-2.7.2.tar.gz
  • ii.jdk-8u77-linux-i586.tar.gz
These files are downloaded and extracted (as these are compressed folders). These two files are placed in Downloads ( you can also place it anywhere. This article considers that the files are kept in downloads)
for, simplicity i have renamed hadoop-2.7.2  filename to hadoop.

2. Updating ubuntu
    

user@user-Thinkceter-E73:- $ sudo apt update

  3. Now, install openssh server in your system.
    use the following command and press enter (enter password if asked)
    sudo apt-get  
     install openssh-server
 4. Creation of Master and slave nodes, use the following command,
    $sudo gedit /etc/hosts
  Now host file is opened, in that add master and slave IP addresses.
  192.168.1.52    master
  192.168.1.55    slave1  
  Note: IP addresses may vary but must be in same network ID.Save and exit.

5. To change name of host
   $sudo gedit /etc/hostname
   In terminal it will ask enter password(Enter your password)
   Now host name file will be opened
   Remove old name and  type new name as master.  Save and exit.

 6. use the following command
   sudo service hostname restart
   Now close terminal and reopen.
   Now the terminal is displayed with new name i.e. master.

 7. use following command to generate key
   $ssh-keygen    –t    rsa
  Press enter 3 times. Now the following screen is displayed with a new key.

   

  
   Now use the following command to copy the key
    $ssh–copy-id   -i   /home/user/.ssh/id_rsa.pub   user@slave1
  


  8. Open new terminal in slave system and type
      $openssh   server 
  
  9. We can check whether the files are upgraded in slave or not with the help of       following command by typing it in the new terminal of the master system
     $ssh slave1
   



 10.ConFiguration of hadoop in master
   Goto Downloads/hadoop/etc/hadoop/  (go to hadoop folder in your system)   
   Open core-site.xml file with  gedit. 
   Add the following lines at the end of the file.
   (In the configuration tag change the name tag and value tag) 
   

  11. Now open     hdfs-site.xml  in the same directory.

      At the end of the file add these lines
      In the property tag

        
 12. Now copy  mapred-site.xml.template and paste it in the same directory.

      Now rename the mapred-site.xml.template  file as  mapred-site.xml.
     Open the renamed file using gedit. and make the following changes at                 the end of the file. 

     
 13. open Hadoop-env.sh file which is in the same directory  at the end of the           file add this line

     Export HADOOP_CONF_DIR=/home/user/Downloads/hadoop/etc/hadoop

 14. Copy files into Slave

  open the terminal in master system
  change the directory   $cd  Downloads
  Now type the command to copy
  $scp –r Hadoop slave1:/home/user/Downloads/Hadoop
  (The above command copies all configured Hadoop files into slave system.)

 15.Goto Hadoop directory and open slaves file with gedit.
    Downloads/hadoop/etc/Hadoop/
    Remove localhost  from the file and type slave1. Save and exit.

 16. Goto Hadoop directory and  copy slaves file and paste there itself and           rename copied file as masters
  Now open masters file type master (by removing slave1). Save and exit

 17. Name node formatting:
    use the following command
  $hadoop   namenode   -format

 18.In terminal type the following command to start all Hadoop services
   $start-all.sh
  Enter Pasword :it may ask you for password(2 times)

 19.In terminal of master system , try the following command
   $jps
  It must show the following list if installation is correct.
  
  NameNode
  Data Node
  JPS 
  Secondary Namenode.

 20. In terminal of slave system if we type
  $jps
  It displays the following.
  
  DataNode
  jps.

 Now, you can execute JAVA rar files in the Hadoop distributed Environment.

No comments:

Post a Comment