How to provide Elasticity To Hadoop using Python Automation

Aditya N
4 min readNov 22, 2020

--

๐Ÿ’  In this article will be writing how to provide desired Storage to NameNode using the concept of LVM and using python for Automation ๐Ÿ’ 

LVM

What is LVM and Why it is needed?

LVM is called as Logical Volume Manager

Managing disk space has always been a significant task for sysadmins. Running out of disk space used to be the start of a long and complex series of tasks to increase the space available to a disk partition which was always an difficult Thing Until the entry of LVM

Steps Require To Setup LVM:

  1. Physical Volume
  2. Volume Group
  3. Logical Volume
LVM infrastructure

Physical Volume:

Physical block devices or other disk-like devices (for example, other devices created by device-mapper, like RAID arrays) are used by LVM as the raw building material for higher levels of abstraction.

Volume Group:

The Volume Group is the highest level abstraction used within the LVM. It gathers together a collection of Logical Volumes and Physical Volumes into one administrative unit.

Logical Volume:

The equivalent of a disk partition in a non-LVM system. The LV is visible as a standard block device; as such the LV can contain a file system .

Example: like the root(\) volume is build upon LV as in future if we need to extend we can do it by Dynamic process.

With the Basics covered up we shall see how we can implement the LVM into the hadoop NameNode(master) how we can provide Elasticity to master node which can scale according the Data Requirement.

Hadoop logs

In the above figure we can see that the entire root volume is getting contributed to the Master node. in the coming Steps we shall see how we can provide specified amount of storage to master node and scale it to our Requirement using the concept of LVM as above discussed.

For this Demo i will be attaching the HD from my system :

One is of 4GB and other one is of 2GB once it is successful we will create the physical Volume

Attached volume

lvcreate <attached volume name>

PV

creation of volume group using from the above created PV

vgcreate <volume group name> <PV name>

To see the VG created we can use

vgdisplay

vgdisplay

Once the VG is created the main part is to deep dive into our LV which can be done through

lvcreate โ€” size <SIZE>G -name<NAME> <name of volume group>

here we have created a LV of 3GB in the name of Hadoop

LV creation

once the job of creating the LV is done we need to format it using mkfs.ext4 and mount it to respective folder in our case it is hadoop master folder named /dn1

Mounting Operation

Now it is time to check how our hadoop namenode is configured with new LV Lets SEE!!!๐Ÿค”

we can check it using

hadoop dfsadmin -report

hadoop logs

Yippee we can see our configured LV of 3GB is attached to our Namenode๐Ÿฅณ

NOW its time to Complete automate the same process using Python Scripting which makes it MUCH EASY

IN This part we will see how we can automate the above process in FLY and make it much Simple

I have written a python script for it in the menu based approach to make it user friendly

Python script

Now as we can extend the size of /dn1 using the concept of LVM with the option8 in our menu program

Once it is done python script will take care of the difficult part and on the go it would have increased the size of /dn1.

one of the main thing which we need to keep in mind while formating the new attached volume as we have discussed above usage of mkfs.ext4 will erase all the data but instead of it we need to keep our data along with it we need to format volume before attaching in Such usecase

we need to use:

resize2fs <attached volume>

now we will see our hadoop namenode connected folder storage increased or not

hadoop logs

Here we can see that the volume has been successfully increased by 0.5G ๐Ÿฅณ

which we achieved using python Scripting!!!

Documentation reference:

Redhat Documentation

--

--