Table of contents
Hadoop 2. X Architecture
The block size is 2. X is 128 MB
The replication factor is 3
Here we have 3 Name nodes
Standby Namenode
Active NameNode
Secondary NameNode
The data nodes are connected to both the standby name node and Active NameNode
As we know the data nodes send their heartbeat to the name nodes, but the problem here is which name node should it send it to as we have two connections
The solution is the active name node will have a file called in_userlock.
Thus the data nodes send their heartbeat to the name node which has the in_userlock file.
The zookeeper will have control of the name node
It will control the name node using the journal node
The active name node will send the fsimages to the edit logs of the journal node which will be read by the standby nodes in sync. This process happens simultaneously
The zookeeper has two nodes fail controller active and fail controller standby
Whenever the name node will fail it will be indicated by the failed controller becoming active and the standby name node will then become active.
Since the data nodes need the in_userlock file it will now be present in the standby node and now it will listen to the heartbeat of the data nodes
When the engineers bring back the original name node now it will behave like the standby node.
But the problem that arises is both the standby node and the active name node will be having the in_userlock file
Such a situation is called as split-brain scenario
By the help of fencing the engineers will remove the in_userlock file in the standby node which was originally the active name node
Now the failure controller will be set to standby and the standby node originally the active name node which was writing the FSimages will now start reading the FSIMAGES
The secondary name node will get the fsimages and the edit logs in regular intervals of time.
Hence the single point of failure problem of 1.x architecture is eliminated
example: